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ABSTRACT 


The  aaeumptlona,  logic,  and  operations  of  an  Informalion  Retrieval 
System  Simulation  Model  are  described  in  this  report.  The  model  has  been 
program  med  in  FORTRAN  II  for  the  IBM  1620.  Two  model  variations,  input 
and  output  examples,  and  some  possibilities  for  improvement  and  develop¬ 
ment  are  given.  Data  reduction  and  analysis  programs,  operating  upon  the 
output  of  the  simulation  model,  are  explained  and  a  set  of  sample  outputs 
for  these  programs  are  given.  Also  included  are  listings  of  the  latest  pro¬ 
grams  developed. 
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I.  INTRODUCTION 


One  object  of  Contract  Nonr  3818  (00)  is  the  evaluation  of  information 
retrieval  systems.  To  facilitate  such  evaluation,  efforts  are  being  made  to 
develop  an  ideal  model  of  these  systems.  The  ideal  model  will  be  a  simulation 
that  will  yield  quantitative  measurements  of  system  performance.  It  will  be 
adaptable  to  any  specific  system  For  the  present,  the  primary  measurement 
has  been  the  time  the  system  requires  to  respond  to  a  given  request  for  infor¬ 
mation.  Response  will  be  either  a  whole  or  partial  an.'^v.  cr  or  indication  that 
no  answer  is  available.  In  addition  to  evaluating  existing  systems,  the  model 
should  be  Luseful  ui  determining  specific  requirements  for  new  information 
retrieval  systems  concepts. 

The  model  utilization  process  is  shown  in  Figure  1.  Data  or  specifications 
from  the  system  being  simulated  are  required  as  inputs  to  the  computer,  as 
shown  by  example  in  Appendix  A.  The  model  simulation  program  utilizes  the 
system  input  data  to  produce  the  data  simulation  cards,  from  which  the  simula¬ 
tion  output  is  printed  (Appendix  B),  and  these  cards  are  used  as  input  to  the 
Data  Reduction  Program  and  the  Analysis  Program  (Chapter  III).  The  output 
of  the  Data  Reduction  Program  consists  of  three  pages  of  summary  statistics 
printed  subject  to  several  data  format  requirements,  is  illustrated  in  section  Z, 
Chapter  I![.  The  Analysis  Program, using  simulation  data  cards  and  system 
acceptance  criteria  as  inputs  (Chapter  IIT,  Section  3),  produces  a  response  time 
estimate,  a  confidence  level  and  a  system  acceptability  decision. 

At  preubi.l  lliw  wj.iiivAlj.tion  model  represents  a  research  tool.  V.'hcr.  t~TT. 
pleted  It  will  be  useful  in  studying  the  response  time  aspects  of  computer-based 
information  retrieval  systems.  Hence, in  its  current  form  its  applicability  to 
practical  problems  is  limited.  The  model  logic,  however,  can  be  used  either 
to  develop  a  simul.ition  for  a  specific  system  or  it  can  be  extended  to  fori.ji  a 
general  information  retrieval  system  simulation 

Future  applications  of  the  final  simulation  model  should  be  considered  now 
so  that  the  subsequent  model  development  work  can  be  evaluated  in  the  proper 
light.  The  model  will  be  of  greatest  practical  value  when  used  as  a  system 
evaluation  tool.  As  such,  a  systems  engineer  (user)  would  use  it  to  estimate 
the  response  time  of  a  given  system.  Once  the  user  finds  that  the  system 


1  - 


configuration  does  not  satisfy  the  predetermined  set  of  boundary  conditions,  he 
could  modify  the  system  by  changing  characteristics  such  as  the  card  read  rate. 

If  the  system  is  judged  acceptable  after  this  modification,  then  either  a  new  piece 
of  equipment  (e.  g.  ,  with  the  new  card  read  rate)  must  be  obtained  (that  ia,  an 
equipment  requirement  has  been  established)  or  a  time-equivalent  improvement 
must  be  found,  again  by  usi.  the  simulation.  Once  the  input  data  for  the  simula¬ 
tion  has  been  prepared,  the  changing  of  any  particular  equipment  characteristics 
can  be  accomplished  by  simply  changing  the  input  data  associated  with  that  factor. 

The  simulation  model  requires  two  types  of  system  data;  namely,  event-time 
data  and  selection  data.  In  an  existing  system  this  data  can  be  readily  obtained  by 
observation.  Sele  ction  probabilities  can  be  es  timated  byconsidering  the  relative 
frequency  of  use.  , 


On  the  other  hand,  if  the  system  to  be  simulated  is  in  the  concept  stage  of 
development,  then  it  will  be  necessary  to  estimate  nearly  all  of  the  input  data. 
The  system  evaluator  could  use  the  model  to  compare  two  or  more  systems 
which  aic  only  "paper"  systems.  For  example,  .a  requirement  might  exist 
such  that  the  response  time  in  an  acceptable  system  must  be.less  than  T  seconds 
P  percei.t  of  the  time.  To  see  whether  any  of  a  given  set  of  systems  satisfy 
this  requirement,  the  user  would  simulate  each  (and  if  all  were  concepts  he 
would  have  to  estimate  their  corresponding  parameters)  and  run  the  results 
either  through  the  data  reduction  program  (and  determine  the  acceptance 
"manually")  or  through  the  analysis  program.  This  procedure  would  give  him 
a  set  of  systems  with  acceptable  response  times.  If  there  were  a  cost  constraint 
or  a  space  constraint  (or  some  other  limiting  factor)  then  another  selection 
procedure,  based  on  these  constraints,  would  be  required 

Asa  research  tool,  the  simulation  model  logic  provides  a  basis  for  the 
study  of  operations  and  equipment  forming  an  Information  retrieval  system. 

The  model  can  be  extended  to  include  a  study  of  queueing  of  questions  at  the 
operator's  station  (and  consider  single  channel  and  multiple  channel  service), 
or  of  real  time  retrieval  systems  where  queries  are  queued  based  on  a  time  to 
process  (or  some  other)  priority.  The  model  can  also  be  extended  to  include 
consider atlon  of  various  record  structures  in  different  storage  media.  The 
molivatior  for  studies  of  this  sort  stems  from  the  fact  that  much  work  has 
been  done  ir>  the  .irea  of,  say,  reconnaissance  systems,  or  the  obtaining  of 
inlormatior,  but  '-'-'p  -jtc  juei  beginning  to  eysimine  the  how's  and  why's  and 
where's  it  irformation  retrieval  systems.  Many  studies  such  as  this  are 
needed  before  we  can  be  as  confident  in  our  understanding  of  information 
tetileval  systems  as  we  are  of  reconnaissance  systems. 
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n.  SIMULATION  MODEL 


ISftID 

One  way  of  simulating  an  information  retrieval  system  is  to  consider  those 
Operations  which  must  be  performed  by  the  parts  of  the  system.  In  a  library, 
for  example,  certain  steps  are  invariably  followed  in  finding  information. 

These  steps  can  be  called  "events."  Figure  2  shows  the  primary  event  sequence 
in  a  library  or  any  other  manual  retrieval  system.  In  this  sequence  the  analyst 
might  be  a  person  requesting  a  book  from  a  library.  He  would  first  decide  how 
he  is  going  to  ask  for  which  book.  He  would  then  make  his  request,  either 
orally  or  in  writing.  The  operator,  who  might  be  the  librarian,  would  then  inter¬ 
pret  the  question  or  request.  The  librarian  might  know  that  either  the  requested 
book  was  out  on  loan  or  that  the  library  never  had  the  book.  If  the  operator  is 
not  sure  (as  happens  in  most  cases),  he  would  look  for  a  loan  record  (after 
deciding  where  to  look  and  what  to  look  for). 

This  event  sequence  would  continue  until  the  analyst  (or  book  requester) 
receives  an  answer  which  mightb©  either  the  requested  book,  a  different  book 
(a  mistake),  or  a  statement  that  the  book  is  not  currently  available.  Both  the 

first  .and.  thi,r.d,.„anaw.«.r.a  .would.,  be. acooptilbl.e . .....T-'H.e  .•e.ec.9nd,,ft,pB>y<?i,r  wopld  np|  be  .  . 

acceptable,  and  the  process  might  be  repeated  again  and  again  until  an  acceptable 
answer  Is  received, 

When  the  number  of  requests  entered  into  the  system  becomes  large  or  the 
files  to  be  searched  become  unmanageable,  the  manual  retrieval  system  would 
ordinarily  be  supplanted  by  a  computer-based  information  retrieval  system. 

Several  new  events  would  now  appear  in  the  primary  event  sequence,  as  seen  in 
Figure  3,  The  operator  in  Figure  3  Is  acting  as  an  interpreter  (between  analyst 
and  computer),  No  longer  can  he  be  simply  the  traditional  librarian.  Now  he 
must  consider  how  he  Is  to  ask  the  computer  to  do  his  searching,  how  he  is  to 
tell  the  computer  of  his  request,  how  he  wants  his  answer,  and  how  he  Is  to 
receive  the  answer. 

The  operator  may  also  have  a  choice  to  make  in  asking  for  information. 

In  this  report  a  distinction  is  made  between  a  question  and  a  query.  A  question 
is  a  request  from  the  analyst  while  a  query  is  a  request  by  the  operator.  The 
operator  may  need  to  use  various  types  of  queries  at  various  times,  such  as 
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FIG  3  .  PRIMARY  EVENT  SEQUENCE  -  COMPUTER  RETRIEVAL  SYSTEM 


comparison,  summaries  and  extremum.  For  a  given  question,  he  may  require 
several  queries  of  each  type  in  order  to  satisfy  the  request.  Suppose,  for 
example,  that  the  information  in  a  record  has  been  stored  in  four  groups: 


1.  Person's  name  (field  11 

2.  Service  number  Ifi^ld  2, 

3.  Rank  (field  3) 

4.  Station  (field  4) 


An  analyst  might  want  to  know,  for  example,  to  what  ship  an  individual  named 
CONGER  is  assigned.  Following  the  procedure  suggested  in  Figure  3,  the 
analyst  would  ask  the  operator  to  obtain  the  desired  information  from  the 
system.  The  operator  in  turn  might  fill  out  a  number  of  query  forms  to 
elicit  the  following  information: 


1.  CONGER 

2.  7500743 

“  3.  RM2 

4,  USS  ANTIETAM 


In  a  query,  similar  to  the  one  below,  an  X  indicates  the  information  desired 
and  a  blank  means  the  information  is  not  wanted. 


1.  CONGER 

2.  - 

i. 

4.  X 

5,  1  (query  type:  comparison) 

On  the  other  hand  the  analyst  may  want  to  know  the  names  of  all  the  per¬ 
sonnel  aboard  the  USS  ANTIETAM,  CVS- 36.  In  this  case  the  query  form  might 
look  like 

1.  X 

2.  - 

3.  - 

4.  USS  ANTIETAM 

5.  2  (query  type:  summary) 
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More  than  one  query  can  be  generated  by  a  question  since  it  is  possible 
tc  consider  the  analyst  asking,  for  example,  whether  CONGER's  serial  number 
is  7S0074E.  In  this  case  the  operator  may  form  two  queries,  the  first  being 


1.  CONGER 

2.  X 

4.  - 

5,  1 


and  the  second 

1.  X 

2.  7500742 

3.  ~ 

4.  - 

5.  1 

Questions  such  as  "What  ship  has  the  most  personnel  aboard?"  would  cause 
an  extremum  query  type  to  be  generated.  With  such  a  query,  all  "hips  would  be 
examined,  the  number  of  personnel  aboard  each  ship  determined,  and  the  ship 
with  the  largest  number  printed  out  as  the  answer. 

The  examples  used  are  given  purely  for  illustration  of  query  types,  The 
model  will  provide  for  use  of  u'p  to  10  query  types  by  the  8,  st.''Tr  being  simulated. 
It  is  also  assumed  that  each  question  asked  by  the  analyst  can  be  handled  differ* 
ently  by  the  aystem;  that  is,  that  it  would  be  possible  to  establish  "question 
categories"  and  within  each  find  that  the  system  responds  in  essentially  the 
same  way. 

The  basic  model  logic  is  centered  on  the  response  time  measure.  In 
essence,  the  response  time  is  equal  to  the  sum  of  all  of  the  event  times  and 
associated  delays.  The  events  in  Figure  4  can  be  defined  as  follows: 

FORM  QUESTION  Determining  what  must  be  asked  and 

how  it  is  to  be  asked 

ASK  QUESTION  The  process  of  asking  the  question 

INTERPRET  QUESTION  Determining  whether  thi.s  question 

can  be  answered  at  this  time 
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Determining  how  many  queries  will 
be  necessary  to  answer  this  question 

Determining  what  the  query  must  say 
The  process  of  preparing  the  query 
The  =  °  t  T 1  I  _  _  *  I  '  •  t-*  «  I  •  ' 

Examining  the  data  base 
Shifting  data  prior  to  output 
Outputting  selected  data 

Collecting  output  data  relevant  to 
the  given  question 

The  process  of  delivery  of  the  answer 

Comparing  the  given  answer  with  the 
expected  answer 

Refusing  a  question 

Determining  how  to  ask  what  must  be 
asked 

With  the  exception  of  the  fourth  event,  all  of  these  are  basic  time  events; 
that  is,  the  time  it  takes  to  perform  the  particular  event  will  contribute  to  the 
overall  time  it  takes  to  respond  to  the  given  request  (response  time). 

Some  of  the  contingency  cases  have  been  included  in  Figure  4,  The  first 
contingency  occurs  when  the  request  (question)  posed  by  the  analyst  is  nut 
acceptable.  In  this  case  the  operator  so  informs  the  analyst,  who  then  reforms 
his  question  and  again  makes  a  request  for  information. 

The  second  contingency  occurs  when  the  query  is  rejected  by  the  system 
logic;  for  example,  when  a  simple  coding  error  has  been  detected .  This  part 
of  the  basic  model  logic  can  be  expanded  to  include  both  query  checking  and 
information  checking  (determining  whether  the  system  has  any  data  on  the 
requested  subject). 


DETERMINE  NUMBER  OF 
QUERIES 

FORM  QUERY 
PREt'ARE  QUERY 
ENTER  QUERY 
SEARCH  FILE 
TRANSFER  RECORD 
OUTPUT  DATA 
PREPARE  ANSWER 

DELIVER  ANSWER 
INTERCEPT  ANSWER 

INFORM  ANALYST 
REPHRASE  QUESTION 
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FIG.  4  -  BASIC  IBDEL  LOGIC 


The  third  contingency  occurs  when  the  answer  is  rejected  by  the  analyst. 
Even  Lhuugli  an  answer  has  been  received,  as  shown  in  Figure  4,  the  simulation 
does  not  stop  until  the  answer  is  acceptable,  which  means  that  the  time  addition 
continues  until  END  is  reached. 

Based  on  the  logic  shovi'n  in  Figure  4,  two  models  have  been  developed, 
programmed,  andrun.  These  models  differ  in  the  way  the  number  of  queries 
is  selected  as  well  as  in  the  way  the  query  types  are  selected.  In  the  first 
model  (Mod  I)  a  random  number  of  queries  of  each  type  aro  selected;  in  the 
second  model  (Mod  II)  the  number  of  queries  to  be  considered  is  selected  first, 
and  then  for  each  query  a  query  type  is  selected  (again  at  random  according  to 
some  selection  rule). 

Figure  5  shows  the  logic  followed  in  Mod  II.  It  has  been  assumed  that  a 
particular  time  event  will  take  anywhere  from  some  time  Tj  to  some  time  T^; 
that  is,  it  will  take  no  less  than  Tj  and  no  more  than  T^,  Each  event  can  have 
its  own  time  range.  In  the  simulation  all  times  are  measured  in  seconds. 

Rather  than  bogging  down  in  a  detailed  problem  in  the  early  stage  of  model 
development,  we  have  assumed  that  the  probability  of  each  time  event  can  be 
represented  by  a  uniform  distribution,  which  is  analogous  (in  the  discrete  case) 
to  saying  that  if  a^^ie-ts" rolled  and  a  1  comes  up,  the  event  took  a  minimum 
time,  of..if"8rZ  comes  up,  the  event  took  T  seconds  longer,  or,  finally,  if  a 
6  cumea  up,  the  event  took  a  maximum  time  (also  assuming  an  unloaded  die). 

If  a  uniform  time  distribution  is  used  and  If  Tj  represents  the  minimum  time  and 
Tj  .represents  the  maximum  time,  then  the  procedure  for  selection  of  a  random 
event  time  within  this  interval  is  to  pick  a  random  number  R  in  the  interval 
from  0  to  1  and  then  substitute  it  in  the  following  equation,  where  T  represents 
the  selected  event  time: 


T  =  T,  +  (Tg  -  Tj)  ‘  R.  (1) 

Figure  6  shows  the  Mod  II  flow  chart.  Whenever  "SELECT  T^"  is  required, 
equation  1  i  applied  by  picking  R  at  random  and  then  substituting  corresponding 
min  and  max  event  times  in  the  equation.  In  this  flow  chart  will  also  been  seen 
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R  =  RAN  (1.) 


This  "trttcmcT’.t  calls  ^c>r  a  anKroiitin**  whirh  will  fitenprate  a  uniformly  distri- 
bated  random  number  and  then  set  the  value  of  R  equal  to  this  number.  A 
different  random  number  will  be  generated  every  time  this  subroutine  is 
used. 

Table  1  shows  that  the  event  sequence  in  the  model  differs  according  to 
the  type  of  entry  device  selected.  For  example,  the  system  being  simulated 
may  have  a  special  query  entry  device  that  enables  the  operator  to  enter  a 
query  simply  by  pushing  a  button  on  a  console  (this  would  indicate  to  the  com¬ 
puter  that  a  certain  query  type  is  desired)  and  then  entering  the  specific  data, 
In  this  case  the  operator  first  forms  his  query  (FORM  QUERY),  then  pushes 
a  button  (ENTER  QUERY),  and  then  enters  his  data  (ENTER  DATA).  Specific 
query  types  may  be  entered  by  certain  entry  devices  more  often  than  by  others 
(assuming  a  choice).  The  simulation  allows  for  this  preference  as  shown  in 
Table  2. 


TABLE  t  •  EVENT  SEOUENDE  VERSUS  INPUT  DEVICE 


DIRECT  ENTRY 

INDIRECT  ENTRY 

KEYBOARD 

CARD 

SPECIAL 

CARD  AND  TAPE 

IQJQQIQSSli 

Funs  uuckt 

A 

M 

V 

V 

1 

PREPARE  QUERY  PORN 

X 

X 

X 

X 

PREPARE  QUERY  CARDS 

X 

X 

CHECK  QUERY  CARDS 

X 

X 

ENTER  QUERY 

X 

X 

X 

ENTER  DATA 

X 

ENTER  QUERY  CAROS 

X 

X 

TRANSFER  QUERY  TAPE 

X 

X 

NOTE;  AN  X  INDICATES  THAT  THE  PARTICULAR  EVENT  ROULD  OCCUR  IF  THAT  INPUT  DEVICE  tliRE  USED. 
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TABLE  2.  INPUT  DEVICE  SELECTION  PROBABILITY  MATRIX 


Query  Type 

1 

Input  Device  Number 

2 

N 

Q, 

a 

Pii 

Piz 

.  .  . 

^iN 

Q* 

Pn 

.  .  . 

^zN 

s 

s 

s 

■ 

■ 

• 

s  •  • 

• 

• 

^M 

Pmi 

Pmz 

•  •  1 

» 

^MN 

NOTE!  (I)  Py  la  the  conditional  probability  of  selecting  input  device  J,  glv.3n 

that  query  type  i  has  been  selected; 

I 

(2)  N  la  the  number  of  input  devicea  in  the  system; 

(3)  M  is  the  number  of  query  categories  (types). 

It  was  mentioned  earlier  that  Mod  II  differs  just  slightly  from  Mod  1.  The 
major  difference  is  in  the  selection  of  the  number  of  queries  and  the  selection 
of  query  types.  In  Mod  II,  the  number  of  queries  to  be  considered  in  a  particular 
run  is  selected  first,  Then  for  the  first  query,  a  query  type  is  picked  according 
to  a  specified  ■election  procedure^  And  this  query  is  "ruu.  "  If  there  are  more 
queries  to  be  generated,  then  for  the  second  query  a  query  type  is  picked.  In 
this  "picking"  sampling  with  replacement  is  performed,  thereby  giving  a  constant 
selection  probability  for  each  query  type. 

There  is  a  slight  difference  in  the  Mod  II  output.  Since  Mod  I  was  primarily 
a  prototype,  the  program  for  Mod  II  includes  several  refinements  not  found  in 
Mod  I.  Mod  II  will  now  act  as  a  prototype  for  a  subsequent  model.  It  is  for  this 
reason  that  only  Mod  II  is  described  in  detail  in  this  report. 

The  simulation  requires  two  types  of  inputs:  event  times  and  selection 
probabilities.  Event  time  data  describes  the  time  range  for  a  given  event.  For 
example,  it  might  take  the  operator  anywhere  from  5  to  1  5  seconds  to  interpret 
a  question  given  to  him  by  an  analyst.  Selection  probability  data  refers  to  the 


observed  usage  of  the  vario.is  query  types  and  I/O  devices.  For  example,  it 
might  have  been  observed  that  60%  of  the  tirro  an  on-line  printer  was  used  for 
output  of  data,  20%  of  the  time  a  console  typewriter  was  used,  and  20%  of  the 
time  an  off-line  printer  was  used.  — 


Specific  input  requirements  can  be  seen  in  the  program  listing  while  a 
numerical  input  example  is  given  in  Appendix  A. 


TABLE  3.  MOD  II  SYMBOL  TABLE 

Symbol 

Meaning 

A 

Probability  of  rejecting  a  question 

B 

Probability  of  rejecting  a  query 

C 

Probability  of  rejecting  an  answer 

D 

Dummy  variable 

I 

Dummy  variable 

IJ 

Dummy  variable 

J 

Dummy  variable 

K 

Dummy  variable 

L 

Dummy  variable 

M 

Number  of  query  types 

NC 

Iteration  number 

NI 

Number  of  input  devices 

NO 

Number  of  output  devices 

NQ 

Number  of  queries  (maximum) 

NR 

Number  of  iterations  for  the  run 

R 

Random  number  dummy  variable 

RAN(1.) 

Uniform  random  number  generator  call 
expression 

S 

Dummy  variable 

TT 

Total  time 
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TABLE  3.  MOD  II  SYMBOL  TABLE  (Cont'd) 


EltlD 


Symbol 

Meaning 

CQC  (L,  1) 

Minimum  time  to  check  I^^  query  cards  (CQC) 
for  device,  where  L  =  1  +  (K-l)M 

CQC  (L,  2) 

Maximum  time  to  check  I**'  query  cards  for 
device 

DE  (I) 

Probability  of  selecting  I‘"  output  device 

ED  (L,  1) 

Minimum  time  to  enter  I^^  query  data  (ED)  for 
K*“device 

ED  (L,  2) 

Maximum  time  to  enter  I^*'  query  data  (ED)  for 
Rthdevice 

EQ  (L,  1) 

Minimum  time  to  enter  l^h  query  (EQ)  forK*"^ 
device 

EQ  (L,  2) 

Maximum  time  to  enter  I.^^  query  (EQ)  for 
device 

EQC  (L.  1) 

Minimum  time  to  enter  query  cards  (EQC) 

for  Rth  device 

EQC  (L,  2) 

Maxlrnum  time  to  enter  I^^  query  cards  (EQC) 
forK'^"  device 

FQ(L,  1) 

Minimum  time  to  form  query  (FQ)  for 

device 

EO  (T,,  2} 

Maximum  time  to  form  Ith  query  (FQ)  lorK^*' 
device 

N  (I) 

tH 

Number  of  queries  of  1  type 

PE  (I.J) 

Probability  of  selecting  input  device  for  I*^ 

query  type 

PN  (I) 

Probability  of  I  queries  being  selected,  where 

I  =  1,2 . NQ 

PQC  (L,  1) 

Minimum  time  to  prepare  I^^  query  cards  (PQC) 
for  device 

PQC  (L,  2) 

Maximum  time  to  prepare  I^^  query  cards  (PQC) 
for  Ril'  device 

PQF  (L,  1) 

Minimum  time  to  prepare  I*’^  query  form  (PQF) 
device 

TAB.LE 

i.  MOD  11  SYMBOL  TABLE  (Cont'ii) 

Symbol 

Meaning 

PQF  (L,  2) 

Maximum  time  to  prepare  query  form  (PQF) 

for  device 

PT  (I) 

Probability  of  selecting  query  type  I 

T  (I,  1) 

Minimum  time  for  event 

T  (I,  2) 

Maximum  time  for  event 

TO  (I.  1) 

Minimum  output  time  on  device 

TO  (I,  2) 

M,.,ximum  output  time  on  I^h  device 

TQT  (L,  1) 

Minimum  time  to  transfer  query  tape  (TQT) 

for  device 

TQT  (L,  2) 

Maximum  time  to  transfer  query  tape  (TQT) 

for  device 

2.  EXAMPLE 

To  show  how  the  alniuVation  model  works,  assume,  for  example,  a  system 
which  has  the  following  characteristics: 

I 

a.  fixed  user  group; 

b.  fixed  data  type  (one  kind  of  data  in  a  record); 

c.  five  query  catagories; 

d.  magnetic  tape  data  otofage; 

e.  fixed  record  structure; 

f.  input  devices :  card  to  core  and  keyboard  to  core; 

g.  output  devices;  on-line  printer,  off-line  printer,  and  console 
typewriter. 

Assume  also  that  the  method  of  operation  is  essentially  as  illustrated  in  Figure  4, 
that  is,  a  question  form  is  prepared  by  the  user  and  given  to  the  operator.  The 
operator  then  interprets  the  question,  and  if  the  question  is  acceptable  he  pre¬ 
pares  a  query  form.  The  query  is  then  entered  into  the  system.  The  output  of 
the  run  is  given  to  the  user.  The  contingency  cases  shown  in  Figure  4  are  also 
assumed  to  hold. 

Table  4  (Appendix  A)  shows  the  input  data,  which  is  just  a  numerical 
sample  --  not  actual  system  data,  necessary  for  Mod  II. 


n  n  n  r>  n 


INFORMATION  RETRIEVAL  SYSTEM  SIMULATION 
MOD  2 

PROGRAM  LIMITS-10  QUERY  TYPES.  5  INPUT  DEVICES.  5  OUTPUT  DEVICES 
10  QUERIES  PER  QUESTION  RASIC  MAXIMUM 

DIMENSION  T<10*2).  PN(10)>  PT(10)»  PEd-.S).  FQ(50.2). 

1PQF(50»2 ) .POCCSO.a) .COCI 50*2) ♦ EQ ( 50 .2 ) *  ED ( 50 *2 ) .EQC(50»2> ♦ 
ZTQt(50»"2)  »DE  i  5  )  .NTT  (101  *  lO(  5*2  )  »ND{  lO) 

100  FORMAT! 15) 

101  FORMAT(4I3) 

1P2  F0RMAT(2F10.4) 

103  FORMAT(10F6.3) 

104  FORMAT! 10F6. 3 ) 

105  FORMAT(5F6.3 ) 
lOA  FORMAT !2F1 0.4) 

107  FORMAT!2F10.4) 

108  F0RMAT!2F10.4) 

109  FORMAT(2F10.4) 

110  FORMAT(2F10,4) 

111  FORMAT!2F10,4) 

112  FORMAT(2F10.4) 

113  FORMAT(2F10.4) 

114  FORMAT(F6.3.2F10.4) 

115  FORMAT!3F6.3) 

200  FORMAT (I4*F10,2. 14 *1013* 10 13) 

300  FORMAT! lOHOATA  ERROR) 

301  FORMAT (42HCORRECT  AND  RE-ENTER  ALL  DATA  -  PUSH  START) 

1  READ  100, NR 

READ  101 ,  M,  NI ,  NO,  NQ 
DO  2  1-1,10 

READ  102,  T( 1 ,1 )  ,  T(I  ,2) 

1F!T( I,l)-T( 1,2) )2,2,900 

2  CONTINUE 

s-b  '  ' 

READ  103,  (PN(I ) ,Ial,NQ) 

DO  3  r»l,NO 

3  5=S+PN(I) 

IF!S-1.)900,4,900 

4  5-0 

READ  104, (PT( I) ,I=1,M) 

DO  5  1=1, M 

5  S  =  S+PT(  I  ) 

IF(S-1.)900,6,900 

6  DO  8  1=1, M 

READ  105, (PE! I,J),J  =  1,NI  ) 

S»0 

DO  7  J=1,N1 

7  S=S+PE( I ,J) 

1F!S-1.)900,8,900 

8  CONTINUE 
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DO  9  I =1 *1 

READ  106, FQ( 1,1) »FQ( I ,2) 

IF(FQ( I ,1 )-FQ( 1,2) ) 9, 9 ,900 
9  CONTINUE 
DO  10  1=1 tU 

READ  107  ,PQF(  I , 1 ) ,POF( I ,2) 
IFtPQFt I ,1)-P0F( I ,2) ) 10 » 10,900 

10  CONTINUE 

no  11  1=1 »L 

READ  108 ,PQC( I ,1 ) ,PQC( I ,2) 
IFtPQCI I ,1 )-PQC( 1,2) ) 11,11 ,900 

11  CONTINUE 

DO  12  1  =  1,  L 

READ  109,CQC(I,1),CQC(I,2) 

IF'.  COC(  I  ,1)-CQC(  1,2))  12,12,900 

12  CONTINUE. 

DO  13  1  =  1, L 

RFAD  no  ,FQ(  I  ,  1 )  ,FQ(  T  ,2  ) 

IF(EQ( I,1)-E0( 1,2) ) 13 ,13,900 

13  CONTINUE 

DO  14  1  =  1, L 

RF/D  111 ,FD( I ,1) ,ED( I  ,2) 
iF (EDI  I ,1  )-En( I ,2) ) 14, 14, 900 
U  CONT INUE 
DO  15  I=1,L 

READ  112 ,EQC ( I ,1 ) ,EQC( 1  ,2) 
lF(EQCn  ,1)-E0C(  I  ,2)  )  15,15,900 

15  CONTINUE 

DO  16  1  =  1, L 

RFAD  113,TQT(I,l),TOT(I,2) 

IF( TOT( I ,1)-T0T( I ,2) ) 16,16,900 

16  CONTINUE 
c,  .n 

DO  17  1^1, NO 

RFAD  114,DE(  I  )  ,T0(  1  ,1  )  ,TOn  ,2  ) 
IF(T0(I,1)”T0(I,2) ) 17 ,17,900 
1  7  s=.9+nF  (  1  ) 

I F(5-i.) 900, 18,900 
1 R  RFAD  1 15,A»R»C 

IF(A-1. ) 19,900,900 
19  IF(R-1.)20,900,900 
pr\  1  F(r-1 .)  21  ,900,900 
21  Nr=0 

KK=N I+NO 
2?  I TrO 

NC  =  NC-t-l 
DO  23  1  =  1, KK 
23  Nn( I  >  =0 
NOT  =  0 
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DO  24  I=1»M 
24  NTT (  I  )«0 

2  <5  TTaTT+T(  l.l)  +  {T(li2)-T(ltl))*RAN(l,) 

26  TT=TT+T(2.1)+(T(2»2)-T(2.1) )*RAN(1.) 

TT  =  TT+T(3*1)  +  (T(3,2)'"T(3.1)  )»RAN(  1. ) 

R«RAN( 1. ) 

IF(R-A)27i28»28 

27  TT  =  TT+T(4.1)  +  (T(4i2)-T«4tin*RAN(l.) 
TT-TT+T(5tl)+(T(5.2)-T(5,l) »*RAN( 1.) 

GO  TO  26 

28  1-0 
S-0 

R»RAN(l4) 

29  I-I+l 

s-s+PN(n 

IF(R“S)30i29.29 
3n  N«I 

NQT«NQT+N 

J-0 

31  J-J+1 

1F(J-N-I)34i32»900 

32  TT«TT+T:6»1)  +  (T(6>2)--T(641)  )*RAN(  1.) 
TT«TT+T(7.1)-4..(T(7.2)-T(7*1)  )#RAN(  1.  ) 
TT-TT+T(8»l)+(T(8#2)-T(a»l) )#RAN( !• ) 

PUNCH  200.NCiTTiNQT.(NTT(n.iI«l*M)  •(ND(  I) 
R-RAN(1.) 

IF(R-C)25»33.33 

33  IF(NC-NR)  22»<901»901 

34  I«0 
S-0 

R-RAN(1. ) 

3^' l-I+l 
S-S+PT( I ) 

'1F(R-SI36»35,35  , 

36  NTT{  n*NTTn  )+l  1 

K-0  1 

S-0 

R-RAN( 1. ) 

37  <■<+! 

S-S+PE(I »K) 

IF(R=S)38»37f37 

38  Nn(K)-ND(K)+l 
L«I+(K-1)*M 

TT»TT+FQ(Lil)+{FQ(L*2>-FQ(L»l)  )*RAN: 1.  ) 

39  TT-TT+P0F(L*1)+(PQF(L»2)-PQF1L»1) )*RAN( 1.) 
TTaTT+PQC(L.l)+(PQC(L»2)-PQC{Lil)  )#R'AM(  1.) 
TT  =  TT+CQC(L»1 )  +  (CQC(L»2)-CQC(L  »1 ) )*RAN( 1. 1 
TT=TT+EQ(L»l)+(eQ(L«2)-E0(Lfl) )*RAN(1. ) 
TTaTT+ED(L»l)  +  (ED(L»2)-ED(L.l)  )*RAN(1.  ) 
TT-TT+EQC(L*1 )  +  (E0C(L»2)-EQC«L  »1 )  )*RAN( 1. ) 
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TT-TT+TQT(L»l)+(TQT(L.2)-TQT(L.l) )*RAN( 1.) 
R=RAN< 1. ) 

IF(R-B)39*40,40 

40  TT=.TT  +  T  .  9  >1  )  +  (T  (  9,2)-T  .9»  1  )  )*RAN{  le  ) 
TT=TT+T( 10,1)+(T(10,2)-T(10»1) )*RAN( 1* ) 
R=RAN( 1. ) 

S«0 

1=0 

41  I = i + 1 

S  =  5+nF( I  ) 

!F(R-S)4?»41*A1 

42  TT=TT+TO(I.1)+(TO(I,2)-TO(I»1) )*RAN(1.) 
I=I+NI 

ND(  I  )=ND(  n+1 
GO  TO  31 

900  TYPE  300 
TYPE  301 

901  PAUSE 
GO  TO  1 


For  this  example,  the  l/O  devices  are  labeled  as  follows; 


Input  Device  Output  Device 

1  -  card  1  -  on-line  printer 

!  . . 

2  -  keyboard''  2  -  off-line  printer 

3  -  console  typewriter 

Since  both  input  devices  are  of  the  direct  entry  type,  the  ENTER  DATA 
(ED)  and  the  TRANSFER  QUERY  TAPE  (TQT)  events  do  not  apply,  and  it  will 
be  necessary  to  Insert  20  blank  cards  for  the  ED  matrix  and  20  blank  cards  for 
the  TQT  matrix, 

3.  MODEL  DEVELOPMENT 

Two  factors  will  be  incorporated  as  the  model  develops.  The  first  is 
equipment  characteristics,  and  the  second  is  a  requirement  for  more  freedom  in 
specifying  time  data, 

Equipment  Characteristics 

At  present  Mod  II  does  no  more  than  approximate  the  actual  operation  of  the 
equipment  in  a  system.  For  example,  in  entering  a  query,  only  the  time  range 
is  specified.  For  the  model  to  be  an  effective  evaluation  tool,  it  should  possess 
a  capability  for  assessing  variations  in  equipment,  say,  in  the  read  rate  of  a 
card  reader  so  that  the  sensitivity  of  the  response  time  could  be  studied  in  light 
of  this  variation.  If  all  other  factors  were  fixed,  then  a  threshold  could  be 
determined  for  this  variable  (read  rate). 

The  incorporation  of  equipment  characteristics  in  the  model  does  present 
several  problems.  For  example,  consider  the  factors  involved  with  the  entry 
of  data  (a  query)  by  way  of  a  direct  entry  keyboard.  This  event  can  be  described 
by  at  least  three  parameters;  entry  rate,  data  volume,  and  data  form  factor. 

The  entry  rate  will  be  an  equipment  characteristic  which  will  be  the  maximum 
number  of  characters  that  can  be  entered  per  second,  and  hence  will  be  measured 
in  character.s  per  second.  Data  volume  will  be  a  function  of  the  type  of  query, 
and  it  seems  reasonable  to  assume  that  there  will  be  a  distribution  of  characters 
for  each  type  of  query.  Data  volume,  then,  will  be  measured  in  characters.  The. 
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data  form  factor  will  relate  to  the  complexity  of  the  data,  and  it  could  be 
assumed  that  this  factor  will  remain  constant  for  each  type  of  query.  The  data 
form  factor  will  be  a  dimensionless  quantity.  If  Rg  represents  the  entry  rate, 
V(Q)  the  data  volume  of  query  Q,  and  F(Q)  the  data  form  factor  for  query  Q, 
then  the  entry  time,  expressed  in  seconds,  is  given  by 

T  la\  -  F(Q)v(Q) 

We  can  represent  an  adjusted  ln,.ut  rate  by  Re(Q)  =  Rg/F(Q),  which  shows 
the  dependence  of  the  input  rate  on  the  ability  of  an  operator  to  enter  data 
variations  resulting  from  query  complexity,  The  best  case,  Rg  =  Rg,  occurs 
when  the  degree  of  complexity  is  minimum.  The  maximum  of  r(Q)  could  be 
such  that  Rg  (Q)  equals  the  "hunt  and  peck"  rate.  An  additional  assumption  can 
be  made  about  F(Q)!  that  there  is  sufficient  volume  of  data  in  each  query  type 
to  make  the  factor  meaningful, 

The  "enter  query  cards"  event  can  be  handled  in  a  similar  way.  In  this 
case  only  two  factors  are  needed:  entry  rate  and  data  volume.  The  entry  rate 
would  be  the  maximu^^  number  of  cards  which  can  be  entered  per  second,  and 
hence  measured  in  cards  per  second.  The  data  volume  factor  would  be  a  func¬ 
tion  of  the  type,  and  would  be  measured  in  cards.  It  can  be  assumed  that 

each  query  type  will  have  associated  with  it  a  discrete  distribution  for  the  number 
of  cards  ncceso-ary,  If  again  represi^nts  the  entry  rate  and  V(Q)  the  data 
volume  of  query  type  Q,  then  the  entry  time,  expressed  in  seconds,  is  given  by 

Tg(Q)  -  V(Q)/Rg. 

It  can  be  seen  that  by  including  equipment  characteristics,  more  significant 
details  of  the  system  are  included  automatically,  thereby  making  the  model 
potentially  more  effective  and  the  results  more  reliable. 

Specifying  Time  Data 

In  the  present  simulation  model  all  time  factors  are  uniformly  distributed 
random  variables  --  the  only  allowance  being  that  the  domain  can  be  an  arbitrary 
interval  of  time.  For  the  actual  time  events,  however,  a  uniform  distribution 
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would  really  be  a  poor  choice  for  an  approximating  probability  density  function 
(pdf).  Since  the  time  data  used  in  the  simulation  will  represent  observations  of 
the  actual  events  (in  most  cases)  and  these  data  will  be  used  to  develop  a  histo¬ 
gram  (time  vs.  number  of  occurrences  --  see  Figure  8)  of  the  event,  then  it 
would  seem  natural  to  approximate  this  histogram  to  represent  the  event  jxif. 

The  choice  of  the  type  of  approximating  curve  is  dependent  on  the  degree  of 
accuracy  desired.  Confidence  in  the  final  results  can  at  least  be  improved  by 
using  a  line  segment  approximation  to  the  parts  of  the  histogram  (see  Figure  7, 
overlay).  Approximation  of  the  time  distributions  of  the  various  events  by  sets 
of  line  segment  shows  that  the  current  nodel  (with  its  uniformly  distributed  ran¬ 
dom  variables)  is  a  special  case. 

The  following  pages  discuss  the  derivation  of  an  equation  which  will  provide 
a  one-to-one  transformation  from  a  uniformly  distributed  random  variable  to  a 
variable  ..aving  a  pdf  approximated  by  a  given  set  of  line  segments.  (Most 
computer  installations,  including  the  IBM  1620  at  HRB-Singer,  have  only  this 
subroutine  for  the  generation  of  random  numbers, )  Also  included  is  a  listing 
of  the  random  number  generator  subroutine  (which  will  be  used  in  a  subsequent 
simulation  model)  along  with  a  listing  of  a  check  program  for  this  subroutine. 

The  following  procedure  can  be  adopted  to  prepare  event  time  input  data  for 
the  subsequent  simulation  model: 

1 ,  Sample  the  evdnt  -  -  that  is,  time  the  event  from  the  start  to  its  finish 
and  record:  repeat  n  -  I  times  for  a  sample  of  size  n; 

2,  Construct  a  histogram  of  the  number  of  occurrences  vs.  time; 

3,  Draw  an  approximating  curve  (composed  of  line  segments): 

4,  List  end  points  of  each  line  segment  (ordered  by  increasing  values  of 
of  the  time  coordinate). 


For  example,  suppose  that  a  particular  event  is  observed  480  times  and  it 
is  found  that  the  following  is  true: 


Tinne  Interval 
(seconds) ' 


Number  of  occurrences 


0-7 

7- 8 

8- 9 
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15 


SAMPLE  SIZE:  480 


TNK  AIOVEHISTOMAM  IS  AN  EXAMPLE  MEANT  TO  DISPLAY  THE  DATA  HYPOTHETIOALLY  OITAINED  SY 
OIIERYINS  A  PARTICULAR  EVENT  480  TIMES.  THE  HISTOSRAM  DEPICTS  THE  NUMBER  OF  TIMES  THE 
EVENT  OCCURRED  WITHIN  A  CERTAIN  TIK  RANBE.  FOR  EXAMPLE,  SB  TIMES  THE  EVENT  TOOK  A 
TIME  BETWEEN  10  AND  11  SECONDS  TO  COMPLETE. 


FIG.  7  -  APPROXIMATIHG  PDF 
FIG.  8  ~  EVENT  TIME  HISTOGRAM  EXAMPLE 
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SAMPLE  SIZE;  460 
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8  10  18  20 
TIME  (SECONOS) 


THE  AlOVE  HISTOQRAM  IS  AN  EXAMPLE  MEANT  TO  OISPLAY  THE  DATA  HYPOTHETIOALLY  OBiAINED  lY 
OBSERVINO  A  PARTICULAR  EVENT  480  TIMES.  THE  HISTOQRAM  DEPICTS  THE  NUMBER  OF  TIMES  THE 
EVENT  OCCURRED  WITHIN  A  CERTAIN  TIME  RANGE.  FOR  EXAMPLE.  33  TIMES  THE  EVENT  TOOK  A 
TIME  BETWEEN  10  AND  II  SECONDS  TO  COMPLETE. 


FIG.  8  -  EVENT  TIME  HISTOGRAM  EXAMPLE 


Time  Interval 
(seconds) 


Number  of  occur-ences 


9-10 

25 

10-11 

35 

1112 

45 

12-13 

56 

13-14 

65 

14-18 

7  3 

15-16 

70 

16-17 

50 

17-18 

30 

18-19 

10 

19  and  up 

0 

From  this  data  a  histogram  like  the  one  shown  in  Figure  8  can  be  con¬ 
structed.  Next  an  appr  oximating  curve  composed  of  line  segments  is  drawn 
as  shown  in  the  overlay  for  Figure  7.  Finally,  the  end  points,  which  in  this 
case  are  (7,0),  (18,80).  and  (19,  0),  are  listed.  These  line  segment  end  points 
will  then  be  entered  into  the  computer  in  an  appropriate  way  and  will  be  used 
to  generate  random  time  values  obeying  this  approximating  pdf. 

The  method  followed  in  generating  random  numbers  obeying  a  distribution 
characterized  by  a  set  of  line  segments  can  be  seen  in  the  following  material. 
F'or  simplicity  first  consider  a  distribution  approximated  by  a  single  line  seg¬ 
ment  (the  extension  to  N  line  segments  in  straight  forward  and  is  Indicated  in 
the  check  program  shown  at  the  end  of  this  chapter). 

The  equation  for  an  arbitrary  line  segment  is  given  by 

l(x)  =  A  X  +  B,  X*  [x^,  Xi] 

where 


A  =  f(>,)  -  f(x^),  B  =  xif(xjj)  -  Xgf(x,) 

■■  ‘  ’‘o 

^nd  (oMy  real  ■valued,  single  -valued  functions  of  a  real  variable  are 

_ ,L.Oi5^dfiX.ed).. _ If  this  iuncti-on  is  to  be  a  probability  density  function  (pdf),  then 

CD 

for  J  {(x)  dx  =  C  C  should  equal  1.  The  function  f  can  be  normalized  by 
-oo 
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dividing  by  C.  The  resulting  function,  g,  is  given  by  g  (x)  =  x  +  (1) 

C  o 

The  cumulative  distribution  function  (cdf)  for-this  function  is 


G(x)  = 


g(t)dt  = 


-V  c 

o 


t  +  ^)dt. 


Hence,  G  (x)  =  —  (x*  -  x*)  +  B(x  -  x^)],  x«[x^,  Xj] 


I  XJU 

where  C  =  j  f(x)  dx 


=  /  (Ax  + 


B)dx 


=  ~  (Xj  "  ii%)  +  B(xi  -  Xq). 

A  wall-known  theorem  states  that  if  X  is  a  random  variable  of  the  continuous 
type  having  pdf  g(x)  and  cdf  G(X),  then  the  random  variable  R  =  G(X)  has  a 
uniform  distribution  with  pdf 


h  (r)  =  1  for  o  <  r  <  1 
-•  0  elsewhere, 


Hence,  it  is  possible  to  take  a  uniformly  distributed  random  variable  R  and  with 
it  arrive  at  a  random  variable  X  having  a  pdf  which  is  a  line  segment.  This  can 
be  done  by  letting  R  =  .G(x)  in  equation  Z  and  solving  for  x. 


(x*  ••  X*)  +  — (x  -x)=  —  x*+—  X-  — X*  +  — X 
o'  c '  o'  2r  n  2C  o  c  o 


See,  for  example,  Hogg  and  Craig,  Introduction  to  Mathematical  Statistics. 
Macmillan,  19  59,  p.  157. 
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Hence, 


I  A  t 
1^1  «2 


—  x^-  +  —  X  +  r'i 
2C  0  C  ° 


Solving  for  x  yields 


=  -B/C  i  ]jB‘/C‘  +  lA/C  +-§-x^  + 


•B  t  4  A^x2  +  2ABx  +  2ACR 

"  O  O 


Now  let  ~ 


A  “  y,  ••  y,,  .  13  =  Xjy^  -  x^y, 


X.  •  X, 


'  ’‘i  ■•  ’‘o 


A  /  ?. 
2~ 


*=>  +  Bl*.  ■  *o) 


I  /  Vi  ••  Vo  I  (x?  X®)  + 


^  (x,  -  Xn) 


j  "  "‘oV'  ■  *?yo  +  ^oVo 


X.  -  X 

1  o 


‘fVo  -  %^yo  -  +  ’‘oVi 


-  ''o 


1  -(x?v:  +“xi'-^.--4  x?v  ”+  x*v 

2(x;  -  x^)  ‘  ‘  ‘  ‘  ° 


-  -  2x^x,yi) 


4 


Hence, 


— - — IHrQ 

C  =  (x  -  X  )  (y  +  y.) 

2 

-  ^^o^iVoyi 
(x,  -  xj* 

-  ^x^^yoVi 

-  ^o)* 

2ABx^  =  ^^oXiVoyi  -  2^ly\  -  Zx^x^yj^  +  2x»y^y, 

(-1  - 

2ACR  =  (yj  -  y|)  R. 

/' 

Substituting  in  equation  3  yields  after  simplifying 

B*  +  A*xJj  +  2ABx^  +  2ACR  =  y*  +  (y*  -  y  * )  R 

and  hence 


To  determine  which  sign  to  choose,  let  Yi  =  y^  +  ^  ftt'd  consider  the  case  where 
.  a/o.  Equation  (4)  now  becomes 

X  =  Xq  (Yq  +  a)  -  Xjy^  +  K  (x,  -  x^)  +  (y*^  +  2ay^  +  a*  -  y*^)  R 

_  (Vq  +  ^  -  yp 
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when  R  =  O,  X  must  equal  x^,  Hence  let  R  =  O  in  equation  5. 


=  X  +  y  {x,  -  x„)  (K  -  1), 
o  ^  o  1  o'  '  ' 

a'  * . 

Hence  K  must  be  plus  one. 

When  R  «  1,  X  must  equal  Xj,  Hence  let  R  =  1  in  equation  5. 


Hence  K  =  +  1  for  any  a/o. 

The  case  where  y^  "-^1’  ®  "  o,  is  treated  differently.  In  this  case 


A 


O 


B 


-X 


A 

C  =  -7-  (x^  -  x*)  +  B{x  -  xj  =  y  (x  -  X  )  . 
;  2  1  o'  1  o'  ^01  o' 


Hence  B 

G(x)  =  --{x  -  x^) 


y  (x  •'  X  ) 

^  o  ' _ 

y  (x!  -  X  ) 

^  o  '  *  o' 


.  G(x) 


as  was  expected.  Now  letting  R  =  G(x)  and  solving  for  x  it  is  seen  that 

In  the  subroutine  that  follows,  equation  6  is  statement  3,  while  equation  4  is 
represented  by  the  statements  headed  by  statemenir^.  — 
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SUBROUTINE  RNDNR  (N.X*Y,C,T) 
dimension  X(10),Y(in),C(10) 

R=RAN( 1„ ) /lO. 

DO  1  1=1. N 
IF(R-C( I ) )  2,1.1 
1" CONTINUE 

2  W= (R-C( I-l ) ) / (C( 1 )  - 
IF(Y( I )-Y( I-l I )  A, 3, 4 

3  T=X(I-1)  +  ( X ( I )-X ( I-l) )*W 
RETURN 

4  T  =  X(  I-1)#Y( I  )-X( I )*Y( I-l) 

T  =  T  +  (X(  n-X(  I-l)  )*SQRTF(Y(  I-l)**2  +  (Y(I)*#2  -  Y  (  I -1  )  *-«'2  )  *W ) 

T  =  T/(Y( I )-Y(  I-l  )  ) 

RETURN 

END 


i.ii.i,.;.; .. .  ,.1 !  lii.;  1 ,1 ,1!  x  :i  ij  ;  (•  >. 
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I 

I 

I 


I 


C  RANDOM  NUMBER  GENERATOR  CHECK 

DIMENSION  X ( 10 ) tY (lO)  ,A(10) ,0 ( 1 n )  ,NN( ini  ) 

1  on  format  (  I  ?  ,  ) 

IDl  FORMAT  (  ?OF/i  .0  1 

200  FORMATdH  »I2»2XIA) 

201  FORMAT ( 14) 

1  RFAD  100,  NtK 

READ  101 ,  (X( I  )  ,Y( I) »  I=1*N) 

A(1 )=0 
AT»0 

DO  2  I=2*N 

A (  1  )  =  (X(  I )-X( I-l  )  )#(Y (  I  )  +  Y(l-l))/2. 

2  ATaAT  +  A( I ) 

C(1 )=0 

DO  3  1=2, N 

3  C( I )sC( I-l  )  +  A( 1) /AT 
ZsK 

Z  • !  X  (  M  )  -  X  (  1  )  )  /  7 
L  =  K  +  1 
DO  4  1=1, L 

4  NN( n«o 

DO  6  1=1.1000 
IF (SENSE  SWITCH  l)fl,9 
fl  TYPE  201,1 

9  CALL  RNDNR  (N.X.Y.C.T) 
no  5  Jsl.L  . 

P-J-1 

IF(T-(X( 1)  +  Z«B) )  A.5.S 
'  S  CONTINUE 

A.NN(J)=NN( J)  +  1 
DO  7  1 =1 iL 

7  PRINT  200.1  .NN(  I  )  lUi!' ■fp.i.tH: 

Pause 

GO  TO  1  . - 

END 


-43- 

Reverse  (Page  44)  Blank 


H0B 


4 


III.  DATA  analysis 


The  measure  to  be  used  for  evaluating  information  retrieval  systems  at 
present  is  response  time,  which  has  been  defined  as  the  time  it  takes  for  the 
sy.'rtem  to  r  espond  to  a  given  request.  The  purpose  of  the  simulation  model  is 
to  provide  epr  esentative  system  response  time  data.  Based  on  this  data  an 
evaiuatoT  would  want  to  know  at  least  two  things  about  the  system' 

1  whether  or  not  the  system  is  acceptable  according  to  a  given  time 
constraint. 

(i.  whether  or  not/the  system  can  be  improved  to  make  it  acceptable, 
ot  better  j 

'I  o  dat?  the  efforts  of  this  task  have  been  restricted  to  the  first  question. 
There  ai  e  two  problems  associated  with  question  one;  (1)  It  might  be  de'sirable 
to  kno  w  in  what  lesponse  time  T  an  evaluator  can  have  a  P  percent  confiidence 
for  the  gi'  en  system  and  (2)  the  evaluator  would  want  to  know  whether  he  should 
accept  O'  'eject  the  system  if  he  must  be  P  percent  confident  that  the  response 
time  1  less  th.an  some  required  time  T*!".  These  two  problems  can  be 
answer  c'd  hy  me  .ms  of  the  two  programs  which  follow 

.1  i! ,  DA,,TA  K,SDUC',r,lON  •■•■Miiir'i-  m 

:  he  gc'.'.c!' f,-,v  rpdurtinn  is  shown^IrT Figure  S.  '  I'he 
purpose  ol  the  data  reduction  program  is  to  develop  a  histogram  (response  time 
vt,  I- equenev  of  occurrence)  and  pertinent  statistics  by  examining  the  simulation 
output  cards.  This  output  can  be  examined  in  several  ways,  some  of  which  are 
as  followh. 


a.  Obtain  a  complete  summary  analysis  of  the  simulation  output.  In  this 
case  all  output  c-ards  ai  e  examined  and  they  produce  one  summary 
listing. 


b  Obtain  paitial  summaries;  that  is,  the  data  reduction  program  would 


examine  loi 


and  then  examine  the  next  100  and  produce  a  composite  summary  of  the 
entire  ilOO  iterations,  and  so  on 
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The  time  histogram  developed  can  be  varied  by  considering  different 
time  interval  subdivisions. 


c . 


Figure  10  is  the  data  reduction  program  flow  chart,  As  will  be  seen  in  the 


program  listing  which  follows,  D1  and  D?,  represent  all  of  the  data  on  two  separate 


output  cards ..n/.  imjj ./(,  ,■  k  ,•(„ 


The  program  requires  some  preliminary  data  prior  to  accepting  the  simula¬ 
tion  output  data,  This  preliminary  data  (data  format  requirements)  are  on  three 
cards.  The  first  card  has  the  number  of  iterations  (simulation  output  cards)  to  be 
examined  and  the  desired  number  of  summaries  (b,  above).  The  second  card 
contains  basic  data  for  the  histogram;  number  of  time  intervals  desired,  the 
length  of  each  interval  (which  is  the  same  for  all  intervals),  and  the  initial  time. 
The  third  card  has  data  about  the  number  of  query  types,  the  maximum  number  of 
queries, -and  the  number  of  input  and  output  devices  involved  in  the  simulation. 

The  simulation  output  cards  follow  these  three  cards,  and  a  blank  card  follows 
the  output  cards  (another  program  requirement  at  present). 


(illustrated  in  section  2).  On  the  first  page  (labeled  page  1):,  which  is  predomi 
nantly  time  data,  will  be  found  the  following  information: 
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DATA  REDDCTION  PROERAN  FLOf  CHART 


1 . 

The  number  of  iterations  considered; 

- 

2. 

The  average  time  per  iteration; 

3. 

The  time  variance; 

4. 

,The  time  standard  deviation; 

5. 

Time  histogram  data  (response  time  vs. 

number  of  occurrence  i); 

6. 

Time  histogram  data  (response  time  vs. 

frequency  of  occurrence); 

7. 

Time  histogram  data  (response  time  vs. 
occurrence).  ■[ 

cumulative  frequency  of 

The  second  page  (labeled  .2).  which  is  concerned  with  query  data,  contains 

the  following  data:  .  -  ■ 

\ 

1.  Number  of  queries  gjenerai.ed  in  the  previously  specified  number  of 
iterations; 

2.  Avera  ge' number  of  queries  pe  r  iteration; 

3.  Quer>  occurrence  per  iteration  data;  ' 

« 

•  4,  Query  type  occurrence  dal  a. 

The  last  page  (labeled  page  3/  conaists  of  information  about  the  utiUeation  of  the 
,J./0  devices, 


-48- 


4 


Q]R[D 


DATA  REDUCTION  PROGRAM  POR  MOD  2 

DIMENSION  NQS! 1 1 ) , NT ( 25 ) , F ( 75 )  , NTY ( 10 ) . NTD ( 10 ) » 

INTT  1  ( 10)  ,NTT2 ( 10 ) ,ND1 ( 10 ) »ND2( 10) 

'  FORMAT (215) 

,  FORMAT(3I5) 

:  F0RMAT(4I3) 

1  FORMAT ( 14. FIO, 2 , 14 ,1013 . 1013 ) 

)  FORMAT ( IHl ,39HINFaRMATlON  RETRIEVAL  SYSTEM  S I MULAT I  ON . 1 3X , 8H { PAGE 
11  )  )  _  ; 
.  FORMAT  (1H0,23HNUMBER  OF  ITER^IONS  =  -.  I5) 

;  FORMAT ( 1H0,9HTIMF  DATA) 

I  FORMATdH  ,5X,29HAVERAGE  TIME  PER  ITERATION  =  ,  F 10 . 2  ,  IX  ,  THSECONDS  ! 
•  FORMATdH  .SX.llHVARIANCE  =  ,18X,F10.3) 

I  FORMATdHO.lflHTIME  INTERVAL  DATA) 

,  format dH0,5X,8HINTFRVAL,5X,21HNUMBER  OF  OCCURRENCES , 5X . 
19HFREQUENCY.4X.15HCUMULATIVE  PROS) 


'  FORMATdH  ,5X,I4,1 
I  FORMATdH  ,5X,I4d 
I  FORMAT (iHl ,39HINFC 
12  )  ) 

'  EORMATdHO.lOHQUFR 
•FORMATdH  ,5X,19HN 
I  FORMATdH  .5X,41HA 
I  FORMATdHO.lTHQUER 
lAVFRAGE/ ITERATION) 


219 


FORMAT (IH 
FORMAT (iH  , 
E;0RMAT  (  IHO 
lE/ftUL'RY) 
FORMATdH 
FORMAT ( IHI 
13  )  ) 

FORMAT (IHO 
■Forma  V  (ihu 


5X,I4,1H-,14,1'^X,I5,16X.F6,3,10X,F6,3) 

5X,I4d2H  AND  GREATER  ,3X.  I  5, 16X  ,F6.3  ) 

39HINF0RMAT10N  RETRIEVAL  SYSTEM  S I MULAT I  ON ♦ 1 3X , 8H ( PAGE 
lOHQUFRY  DATA) 

5X,19HNUMBER  of  QUERIES  =,I5) 

5X,41HAVFRAGE  NUMBER  OF  QUERIES  PER  ITERATION  =,F6,2) 
17HQUERIES/ITERATI0N,3X,21HNUMBER  OF  OCCURRENCES . 3X . 1 7H 


8X,I2 .17X.I5,16X,F6.2) 

,3X,9HM0RE  THAN,I3.12X.  I  5,16X  .F6.2  ) 

lOHQUFRY  TYPE,10X»21HNUMBER  OF  OCCURRENCES . 3X » 1 3HAVFRAG 


flX,l2  ,1TX,I5,15X,F6,2) 
39HINFORMATION  RETRIEVAL  SYSTEM 


SIMULATION. 13X.8H(PAGE 


24H'INPUT/0UTPIJT  DEVICE  DATA) 

1 4HU;r  .  3  X  .  2 1 HAVERAG 


IE  use/iterat.ioni 

,■  FORMAT  ( IH  ♦12HINPHT  DFV  rCEj  !3?  IK-  ,  !4..9y.  .FA*?  1 17X.F6.,.2  ) 
I  FORMATdH  .13H0UTPUT  DEV I CE  ♦  1 2 . 1H- .  1 4 . 9X  .  F6 . 2 , 1 7X  ,  F6 . 2 
I  FORMATdH  /5X,20HSTANDARD  DEVIATION  -,9X.F10.3) 

,  READ  lOO.NR.NN 

L  =  N(<VNfN -  - 

READlOl.NTI  .LI .IS 

READ  102 »M,NI .NO.NO 

N=?#N0+1 

K=NI+NO 

SS-O 

SSQ-0 

NTQ  =  0 

DO  2  I=1,N 
NQSd)=0 
DO  3  1=1  .NT  I 
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NT ( I  )  =0 

3  F  (  I  )  =  IS+LI'M-I 
DO  A  1  =  1  ,M 

4  NTY (  I  )  =0 
DO  5  1  =  1  ♦k'. 

5  NTD( I )=0 

READ  103 tNl *X  ,NQ1. (NTTl ( n . 1  =  1 *M) ♦ (NDl ( I ) ,I =1 ,<) 
J=1 

6  READ  103  fN2»Y,NQ2..  (NTT2(  T  )  ,  1  =  1  ,M)  » (ND2  (  I  )  »I=1  »<) 
IF(N2)7»11.7 

T  IF (N1-M9 ) 1 ,1 ? 

4  N1=N2 
X  =  Y 

DO  9  I=1»M 
9  NTTl ( I ) sNTT2 (  I ) 

DO  10  I=1,K 
10  NDl ( I )=ND2 (  I  ) 

M01=N02 
GO  TO  6 
n  Jcf 
1?  F<;aA5i  +  X 

SAO=SSQ+X**2 
DO  13  1  =  1  ,M 
TF(N01-ni3,U,13 

13  CONTINUE 

14  N0F( I )=NQS( I )+l 
NTQ»NT0+N01 

DO  15  I=1»NT1 
IF(X-F1 I ) 1 I61I5.I6 

15  CONTINUE 

16  NT{ I )=NT( 1 )+l 

DO  17  I=1»M  '  ■ 

17  NTY ( I 1-NTY ( I ) +NTT1 ( I  ) 

no  IF  1=1 »K  . 

'  o,  MTD  !  1  !  aNTO  (  J.  >  4,Nn  1  !  T-l 
DO  19  1=1 iNN 
IF(Nl-r<»L)  fl*20,19 

19  CONTINUE 

20  W«N1 

XP  =  .SS/W 

XV=SS0/W-XB**2 

XS=SQRTF(XV) 

PRINT  200 
PRINT  201. N1 
PPINT  202 
PRINT  203. XR 
PRINT  204. XV 
PPINT  223. XS 
PRINT  205 
PRINT  206 


i 

I 

I 

I 

I 


i 

i 


I  1=0 

FK  =  NT  (  1  ) 
fy  =  fK/VI 
rPrFK. 

PRINT  207,II»F(J1*NT(1)*PK.CP 


T i=NTr-i 

nn  ?A  1  =  2*  1 1 

MMr.  I-l 
Fk'.-MT  (  T  ). 

FK=FK/W 
.  C;P  =  CP  +  FK 

21  PRINT  207  *F(MM) .F ( I ) .NT( 1 )  ♦FK.CP 
FK=NT(NTI)  • 

FKaFK/W 

PRINT  208iF( I  I ) .NT (NT  I ) *FK 

PPIMT  209 

PRINT  210 

PRINT  ?1 1 ,MT0 

ri  =  MTQ 

filacVW  ■ 

PRINT  212*ril 
print  218 
II=N-1 

no  22  is^i* n 
h=n;os(  I ) 

H“H/W 

22  print  214.1  .NQS( I ) »H 
hsNOS(N) 

HxH/W 

PRINT  2T5.  I  I  .NOSIN)  ,H 
PRIM’  214 

no  2"^  I  =  1 

H»NrY( I ) 


■■  ■  2  3  PK I  N'T  217 

PRINT  ?.\A 


. 1 .NTY( I )  .H 


PRINT  210 

print  220 

DO  241=1, NI 
nisNTPt  I  ) 
02=01  /O 


00=G1 /w 

24  PRINT  221 . I .NIDI  I ) »02 ,G3 
DO  25  1=1 .NO 
<1 =NI+I 


0.1=NTn(Kl) 

02=01 /G 
G3=G1/W 

21  PRINT  222, I ,NTD(K1 ) .G2.G3 
00  TO  (8,26),J 


•  51 


2.  EXAMPLE  -  DATA  REDUCTION  PROGRAM  OUTPUT 

The  form  of  the  data  reduction  program  output  can  be  illustrated  by  con¬ 
sidering  the  numerical  output  data  given  in  Appendix  B.  There  were  500  samples 
generated  in  this  particular  run.  The  three  required  header  cards  for  the  data 
reduction  program  are 

CARD  VALUES 


1 

NR 

=  500, 

NN  = 

5 

2 

NTI 

=  10, 

11 

50. 

IS  =  150 

3 

M  = 

5,  N1 

=  2, 

NO 

=  3,  NQ  =  3 

As  previously  mentioned,  the  header  cards  are  followed  by  the  iteration  cards, 
the  last  of  which  is  followed  by  a  blank  card.  The.  following  pages  indicate  the 
output  produced  by  the  data  reduction  program,  For  convenience,  histograms 
have  been  prepared  (Figures  11  15)  and  follow  the  output  example.  It  should 

be  mentioned  again  that  this  is- a  numerical  example  used  to  show  the  form  afud 
substance  of  the  program  output  and  meant  to  illustrate  the  capabilities  of  the 
program. 

Using  the  example  cited  above,  assume  that  someone  wants  to  know  in  v. 

what  response  time  of  the  system  he  can  be,  say,  90%  confident.  An  examination 
of  Figure  15  (b)  shows  that  the  system  response  time  will  be  less  than  450 
jecoTids  about._90%  of  the  time.  Hence,  based  on  this  data  he  can  expect  that 
90%  of  the  time  he  will  get  a  response  to  .a  request  from  this  system  in  something 
less  than  seven  and  a  half  minutes. 

As  can  be  seen  in  the  program  listing,  there  is  some  degree  of  freedom 
allowed  in  the  output  format.  For  example,  there  can  be  up  to  25  time  sub¬ 
divisions  if  desired.  The  output,  in  its  present  fptpi,  , permits  ‘an  examination 
of  pag'  1  for  time  data,  page  2  for  query  data,  and  page  3  for  l/O  device  daU. 
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Ii\irUKi-lAT  lOij  KliTklEVAL  SYSTbiM  S I  riuL  AT  I  Hi- 
iNUi  ilit'k  UF  ITfcKATl  Di'iS  lOd" 


!  tn  n  c- 


'TTMT'MTA 

AVERAGE  TIME  PER  ITERATIlilvi  = 

variance  = 

_  _ STANDAKU  deviation 

TIME  INTERVAL  DATA 


327. A3  SECONDS 
'7360.01'n 

85.440 


INTERVAL  NUMBER 

OF  OCCURRENCES 

FREQUENCY 

CUMULATIVE 

0-  150 

0 

0,000 

0.000 

150-  200 

6 

.060 

.060 

200-  250 

12 

.120 

.180 

250-  300 

23 

.230 

.410 

300-  350 

21 

.210 

.620 

350-  400 

19 

.190 

.810 

400-  4  50 

fl 

.080 

.890 

450-  500 

8 

.080 

.970 

500- T50 

3 

.030 

1.000 

-  550  AND  GREATER 

0 

o.noo 

I.'lt-OKrlAT  KtTkThVAl.  SYSTF-'  SUiUlATinM  (PAGE  2) 

OUL-KY  data  ■  -  -  -  .  . 

!\i  J !'l b E K  LI G  (I I J I- h!  1 1-  S  =  1  /S 3 

AVI:l<A(.b  l■IU^ilJER  OF  DIJEKI  ES‘  PI-«  '  I  TFRAT  ION  =  1763 

inJL-KIES/ITtKATIUlM  K'UMP.ER  OF  OCC IJRR  FNC  E  S  AVE  RAGE  /  ITE  RA 1 1 UM 

1  52  _  ,52 

1  2  33  . .  "  V33  ■ 

■  3  15  .15 

4  . .  ■  0  O.OO'^' 

6  0  _  _  n.no 

6  n  -  . 'n.'nn 

MORE  THAN  A  0  0.00 


Q]R[D 


MORF  THAN  A 

OOERY  TYPE 
1 
2 

3 

4 

5 


NUMBER  OF  OCCURRENCES  AVE RAGE / OUE RY 
71  .43 

50  .30 

24  . 

11  .06 


a 


INFGRMATrON  RETR I-EV  AL '  SYSTFM  SIMULATinM  '  ‘  "  “(PAGE'  3) 

TfgPUTTDUTPur  UEVlCE"C)/rrA 


UTILIZATION  OF 

- 

AVERAGE  USE/OtJERY 

AVERAGE  USE/ ITERAT TON 

INPUT  DEVICE 

1= 

124 

.76 

1.24 

INPUT  DEVICE 

2  = 

39 

.23 

.39 

OUTPUT  DEVICE 

1* 

100 

.61 

1.00 

Output  device 

2  = 

25 

.  15 

.25 

OUTPUT  DEVICE 

3= 

38 

.23 

.  3« 
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I.nFURi-IATIUi'J  KtTRItVAL  system  SIMULATION  (PAGE  1  )■ 

i'i  U I TB  E  R  D  F  I  r EIR  ATI  DKl  S  ~=  -  “  ■2I3T5 


AVERAGE  TIME  PER  ITERATION  = 

325. R9  SECONDS 

variance  = 

STANDARD  DEVIATIUN  = 

7419.110 

0  4.  A  -a  /. 

ww  •  A  -r 

TIME  INTERVAL  DATA 


INI  RVAL  NUMBER 

OF  OCCURRENCES 

FREOUBNCY 

CUMULATIVE  PROB 

* 

0-  150 

0 

0.000 

0.000 

150-  200 

11 

.055 

.055 

200-  2,^0  "  \„K 

29 

.145 

.200 

1 

250-  300  ii  T:,/ 

48 

.240 

.440. 

.  366-  3^50  .  '  ' 

40 

.200- 

.640 

,,3'50-  400 

31 

.155 

.795 

£ 

v^^bO-  450 

18  . 

.090 

.885 

450-  500 

16 

.080 

.965 

500-  '550  - - 

"""7 - 

--“--7(5^5 - 

£ 

550  AND  GREATER 

0 

6.000 

r- 

f 


4 


57 


( 


i.iUtKY  DATA 

MJi'lBtiK  (IH  I'UHRItiS  =  333  ' 

AVfcRAGt  NUKBfik  l)F  OUfiklhrS  PER  ITERATinM  =  1.66' 


UJERIES/IT 

1 


h 

ft 

HOKE  Tt- 

ivJSRY  TYPE 
1 
k 

3 

4 
b 


ERAT  li  lN 


NUMBER  OF 


■  nc c UR rfmcfs 
101 


AVF  RAOF/  HERAT  ION 
.  B  0 
■.32 


.1  • 


NIJ M H E R  rJF  OCCURRENCES 

. .  12  ft 

99  ■ 

56  '  • 

3  5 
15 


.0,00 

0.00 

. 0.00 

0.0  0 

AVF RAGE/ QUERY 
~'.3ft' 

.29 

.16 

.10 


ii\Ft)KriAl  itji'i  I'if;  T  K  i  b  VaL 


IMPUT/UUTPUT  ObVICH:  DATA 


IIT  I  L  IZAT  IUi\i  UP 

AVHRAue  .USr-/01if-KY 

AVeRADF  ir.9F/  TTERAT  ION 

INPUT  iJHVICt- 

1= 

23R  .71 

1.19 

INPUT  QEVICl: 

2  = 

y  5  .  2  « 

.4  7 

UUTPUT  UEVICe 

1  = 

201  .AO 

1.00 

OUTPUT  Device 

2  = 

3.9  .  1 1 

.19 

UUTPUT  Device 

3= 

9  3  .27 

.46  ■' 

.TUf’MATIU.-;  UETRIHVAL  SYSTEh  S  I  M-ILAT  I  riM 


(PAGE  1) 


MJi.LJEK  LiF  irEkATIUi\;S '=  30fl 


Tli-IE  DATA 

AVbRA(ih  TIKiE  PER  ITERATION  =_ 
VARIANCE  =■  '  ' 

stanoakij  dev  I  at  I  UW  = 


3  27.R6_  SECONDS^ 
7 7ft «. 2 40 
8f3.137 


TIME  INTERVAL  DATA 


INTERVAL  NUMBER  OF  OCCURRENCES  FREtOUENCY  CUMULATIVE  PROB 


u- 

150 

0 

0.000 

0.000 

150- 

200 

19 

.063 

.063 

2  00- 

250 

42 

~  "V  .  .r40 

.203 

250- 

300 

6H 

.226 

.429 

350 

60 

7200 

.629 

350- 

400 

47 

.156 

.786 

400- 

450 

30 

.100 

.886 

450- 

500 

23 

.076 

.963 

500- 

ss'o . 

.  0"!3"3 

550 

AND  GREATER 

1 

.003 

- ^ 

9 


- — — — ORIO 

T .  .  r- r .  A  i-  I  .  ....  T  t  A  »  cvctl:;.*  CI.V.JII  ,\TTfKx;  • 

‘  \  r-  A  VT..  £.  / 

OaERY  DATA  - - 


N'JMBEk  Uf-  uuEUlfcS  =  bOft 


AVERAGE 

NUi'lBER 

liF  CUERIFS  PER  ITERATIDM 

=  1_.6H 

UJERlES/ITtRATUliM 

MUMHEK 

OF  nCCliRREMCE.'; 

AVERAGE/  HERAT  IC 

:  ,  ‘  '  •*  i 

i  3  1  1  ;  :i 

153 

.51 

2 

■  « 

.29 

3 

59 

.19 

A 

0 

O'.  00 

5 

0 

0.00 

6 

0 

0.00 

MURE  THAN 

6 

0 

_ 

UlUERY  TYPE 

NUMBER 

OF  nCCLlRRFMCES 

AVEPxAGE/OUERY 

1 

197 

.3R 

2 

164 

.32 

3 

74 

'  •■.14 

4 

4'5 

.08 

■■  . .  5  ■ 

2‘6 . .  . . 

.OS' 
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I  iMPUT/aUTPUT  'Device'  DATA 


TmiTZxnrjiT'DF  '  ■xvfi^age  use/oueky  average  itgRATlni^ 


INPUT  DEVICE  I-  365  ,12  __  _  1.21 

INPUT  DEVICE  ■  2=  14l . .  .'"P.l""  "  "  '  ”  .47  ' 

OUTPUT  DEVICE  1*  309  .61  1.03  ■ 

DJTPLJT  DEVICE  '2=  ~  5ft  .Tl  "  '.T9  . 

OUTPUT  DEVICE  3=  139  .27  .46 
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ilUtibEK  UF  ITEkATIUMS  =  -'hDO 


Tli-ir’DAT'A'''  . 

AVERAGE  TIME  PER  ITERATION  =  329.05  SECONDS 

VARIANCE  =  7357.300' 

STANDAKiJ  DEVIATION  =  05.774 


TIME  INTERVAL  DATA  _ _ 

INTERVAL  NUMBER  UF  UCCURRENCES  FREQUENCY  CUMULATIVE  PROB 


■■  0- 

-Ij-Q- 

1 

,002 

. .002 

150- 

200 

22 

.056 

.057 

200- 

2'50 

'  ■  52  . . 

^  . 

.  .187 

250- 

300 

91 

.227 

.415 

JOo- 

350 

83 

.207 

.6  22 

350- 

400 

68 

.170 

.792 

400- 

4  50 

41 

.102 

.095 

450- 

500 

28 

.070 

.965 

500- 

550 

13  . . 

■  ■“.'032 

'  .997 

550 

AIMU  GREATER 

1 

.002 

1 


INFORi'lAT  LON  RETRIEVAL  SYSTEM  SIMULATION 


(PAGE  2) 


UUEKY  UAJA 

_  NJMbEK  OF  QUERIES  =  676 _ 

'  AVERAGE  NUMliEH 'of  QUERIES  PER  ITERATION  =  1.69 


QUERIES/ ITERATION  NUMRER  OF  OCCURRENCES  AVERAGE/ ITE'RAT  ION 
1  ..  _ 201 _  __  ,50 


HOKE  THAN 


.19 
0,00 
0,0  0_ 
0,00 
0.00 


QUERY  TYP_^ 
1 
2 
S 
4 

’  . . 'S’ 


NUMBER  OF  OCCURRENCES  AVERAGE/OUERY 
259  .38 

217  _ _ _ _ 

~"99  '■  .14 

67 _ _  _  _  _ ^.09 

■  ■  3"4~~  "  "  .1)5 


I 


■ 

I 


INFOKMATIOi'l  RtETRIEVAL  SY  St  E'm“  S  I  MUL  AT  I  ON 


(PAGE "3 ) 


I 

I 

I 

I 

I 


TNPUT/ODT PUT  DEVICE  DATA 


UTILIZXnmvTTJF 

AVERA^.E  USE/PUERY  AVERAGE 

IjSTTTTeraTTon 

INPUT  DEVICE 

1  = 

4«2 

.71 

1.20 

K'JPUT  DEVICE 

2  = 

194 

.28 

.48 

OUTPUT  DEVICE 

1  = 

401 

.59 

1.00 

uu  i’k’u  I  ucviCt 

£1  = 

r 

~.T2 

□UTPUT  DEVICE 

3= 

190 

.28 

.47 

I 

] 

I 

[ 

,1  il 

,  ,  I  „ 


V  - 
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IN, FORMATION  RETRIEVAL  SYSTEM  SIMULATION 


(PAGE  ;1) 


number  UE  ITERATIONS 


AVERAGE  TIME  PER  ITERATION 

VARIANCE  = . 

STANOARU  DEVIATION  = 


_ 330_._50_SEC0NDS_ 

7618.840 

87.-285 


TIME  INTERVAL  DATA 


INTERVAL 


NUMBER  OF  OCCURRENCES' 


FRE^QUENCY 


CUMULATIVE  PRDB 


QUID 


UUfRY  mTA 


NUMBER  OF  QUERIES  =  844 _ 

AVERAGE  NUMBER  OF  QUERIES  PER 


ITERATION  s 


eLTfRl^S/ITEKATlON  ““NUMBER  OF ''"bcCTURRENCE'S'"'  ■AVERATGETITERAT  ION 
1  ^  251  .50 


"2  15^4  “  ““  ".To 

3  95  .19 


. . .  . . 


QUERY  TYPE  __  NUMBER  OF  OCCURRENCES  AVERAGE/OUERY 


t^icOBMATION  RETkitVAr  SYSTEM  SIMULATION  (PAGE  3) 

■■  TI^PUT/UUTPar  'DEVTCE  DAT'A  . .  . 


"OTinTTTTUmraF 
input  device  _  1=  6U5 
Input  device.  "2*  239 
output  device  1_=  500 

ntT f  Pi jT  ^  V  ice  "2^  Ids 
OUTPUT  DEVICE  3"  239 


AVEKAGE 


"uSE/dUERY 

7_1 _ 

.28 

•59  _  _  , 

■■.T2""  “ 
.28 


Average 


USE/lfERATirlN 

1.21 

.21 

.47 


I 

T 


100  200  300  400  60(|  600 

TIME  (SECONDS)  '  '■ 

N.. 

(A) 

FIG.  15  -  Tim  HISTDBRAMS 


100  200  300  400  500  600 

TIME  (SECONDS) 

(8) 

500  SAMPLES 


_ .  .....  _  _ 

nyu-Xysis  'vissfy *?!  pyoHI^m  <?ntgilly  tHsyfi 

only  two  questions  the  systenri  evaluator  would  want  to  have  answered; 

>1 

!•  Given  sj  response  time  T,  he  would  want  to  know  the  probability  that 
the  sys|{tem  could  respond  in  this  time.  • 

2.  Given  a|,  system  acceptance  criterion,  he  would  want  to  know  with  P 
per.ceni^  confidence  that  the  response  time  of  the  system  was  less  than 
^ome  specified  time  T. 
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The  analysis  program,  just  as  the  data  reduction  program,  examines  the 
simulation  output  cards  (see  Figure  16).  To  do  this,  two  header  ca|-ds  are 
necessary.  The  first  header  card  has  data  about  T  and  P  (as  defined  above). 


$ 

II 

1 

III 

SIMULATION 
DATA  CAROS 


C 


7 


TANOE 
CRITERIA 


COMPUTER 


T 


oo 

CrO 

CrO 


ANALYSIS  PROGRAM 


RESPONSE  TIME  ESTIMATOR 
/^  /  CONFIDENCE  LEVEL  OEOISION 


FIB.  16  -  ANALYSIS  PROCESS 


If  P  is  equal  to  aero,  the  program  interprets  this  as  meaning  question  one 
(above)  is  to  be  answered;  otherwise  (i,  e. ,  P  /  6)  question  two.  The  second 
header  card  has  on  it  the  number  of  iterations,  TSI,  to  be  examined.  The  header 
cards  are  followed  by  N  iteration  cards,  the  last  of  which  is  followed  by  a  blank 
card.  The  procedure  followed  by  the  program  is  shown  in  Figure  17.  Specific 

j| 

input  formats  appear  in  the  program  listing. 

The  output  for  question  one  (given  T,  find  P)  is  simply  the  printed  statement 

PROBABILITY  TIME  LESS  THAN  (T)  SEC.  IS  (P),  , 


where  numerical  values  replace  T  and  P.  For  example,  consider  the  simulation 

-output  data  of  Appendix  B.  Suppose  the  evaluator  wants  to  know  the  confidence _ 

level  P  for  a  response  time  of  500  seconds.  Since  this  is  question  one,  the  first 
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header  card  would  have  T  =  500  and  P  s  O.  The  second  header  card  would  have 
the  number  .of  Iterations  to  be  examined,  and  in  this  case  N  =  500.  These  header 
cards  are  followed  by  the  simulation  output  cards  (again,  these  are  followed  by  a 
blank  card).  After  the  cards  have  bean  read  by  the  computer,  an  answer  would 
be  printed  out  which  would  be 

PROBABILITY  TIM£  LESS  THAN  500.  00  SEC.  IS  .  958 

This  answer  is  interpreted  as  meaning  that  95.  8%  of  the  time  this  system  would 
respond  to  a  given  request  for  information  in  less  than  500  seconds. 

The  output  of  question  two  consists  of  two  statements 

(X)  SYSTEM  SINCE 

II  \  I 

■PROBABILITY  TIME  LESS  THAN  (T)  SEC.  IS  (P), 

“wher^X)  vTiH  be“ilther  ACCEPT  or  REJECT;  and  there  will  be  numerical 
values,  as  with  question  one,  for  T  aAd  P.  Suppose  it  is  required  that  for  an 
acceptable  system  the  response  time  i^ust  be  less  than  500  seconds  with  a 
confidence  level  of  90%.  The  printed  Autput  answer  would  be 

ACCEPT  SYSTEM  SINCE 

PROBABILITY  TIME  LESS  THAN  500.  00  SEC.  TS  ,  958 

As  a  second  example,  suppose  it  is  necessary  to  be  sure  that  90%  of  the  time 
the  response  time  is  less  than  400  seconds.  In  this  case,  (for  the  same  data) 
the  printed  output  would  be 

REJECT  SYSTEM  SINCE  ^ 

PROBABILITY  TIME  LESS  THAN  400.  00  SEC.  IS  .  790. 

Marginal  rejection  cases,  such  as  reject  because  the  calculated  P  value  is  just 
slightly  less  than  the  required  confidence  level,  should  be  interpreted  by  the 
user  to  determine  the  system's  acceptance  or  rejection. 
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C  ANALYSIS  PROGRAM 

100  FORMAT(F8.2*2X.F6.3) 

101  FORMAT! 15) 

102  FORMAT! I4.F10.2) 

200- FORMAT !36HENTER  SIMULATION  CARDS  -  PRESS  START) 

201  FORMAT! lHO,26HPROBAaiLITY  TIME  LESS  THAN *F8 , 2 »2X »4HSEC. *3H  IS» 

202  FORMAT!1H0.19HACCEPT  SYSTEM  SINCE) 

203  FORMAT! 1H0.19HREJECT  SYSTEM  SINCE) 

1000  READ  100tT»P 

IF(P)2.1*2 

1  L  =  1 

-  ^  (iO  TO  3 

2  L  =  2 

3  TYPE  200 
PAUSE 

READ  101 #N 
PR«0 

READ  l02»NliTl 

4  READ  102|N2»T2 
1F(N2)5»7#5 

5  1F(N1-N2)7»6.7 

6  N1-N2 
TlaT2 
fib  TO  4 

7  1F(T-T1)9»8»8 
R  PRaPR+1, 

9  IF!N1-N) 6»10»10 

10  W-N 
.■:PR»PR^W 

IF!L“l)12»lltl2 

11  ISRINT  201«TtPR  j 

Pause  -  -  ; 

GO  TO ^00  “  “  — ■ 

1  in  T  e  >  ns.p  t  1  A  ,  1  1  R  . . . . 

13  PRiNT' 202 

GO  TO  11  .  j 

1-4  PRINT  2'OS^  1 

GO  TO  11  ,r  ■ 

END 
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IV.  SUMI^RY  AND  RECOMMENDATIONS 


To  date,  our  primary  goal  has  been  the  development  of  a  basis  for  evaluating 

\\ 

information  retrieval  systems,  baaed  on  the  time  it  takes  for  the  system  to  res¬ 
pond  to  a  given  request  for  information.  Recent  discussions  of  the  model  sub¬ 
stantiate  the  choice  and  use  of  response  time  as  a  primary  measure  for  simulat¬ 
ing  and  evaluating  retrieval  systems.  The  model  is  still  in  the  early  stages  of 
development.  Its  logic  has  bAjen  developed  along  with  ty/o  variatlonB  in  the  .mode.L 
(^each  of  whicji  has  been  programmed  and  run  on  our  IBM  1620  computer).  The 
evaluation  aspect  of  the  problem  has  been  examined  and  two  programs  have  been 
developed  which  use  the  simulation  model's  output.  The  first  of  these  is  a  data 
reduction  program  which  produces  a  summary  of  pertinent  statistics  obtained 
from  the  simulation  run.  The  second  is  an  analysis  program  which  can  be  con¬ 
sidered  a  model  in  the  sense  that  once  the  user  has  decided  on  the  acceptance 
level  of  the  system,  the  model  can  examine  the  simulated  data  and  determine 
whether  or  not  the  given  system  is  accepted  by  the  specified  standards, 

In  the  coming  year  the  existing ~mbdels  will  be  refined  and  extended  by~ 
including  more  equipment  characteristics  (that  is,  by  introducing  var'^ous 
equipment  parameters  such  as  card  read  rate,  tape  density,  word  slee,  etc, ,  to 
be  used  in  the  simulation)  as  well  as  including  query,  record  and  output  char¬ 
acteristics,  This  work  should  complete  the  simulation  model  of  the  computer- 
based  information  retrieval  system. 

After  the  model  has  been  completed,  a  model  of  a  manual  information 
retrieval  system  will  be  developed,  in  which  the  data  may  be  stored  as  hard 
copy  and  retrieved  entirely  by  human  operation.  Equipment  characteristics  to 
be  included  in  this  case  are  such  factors  as  storage  capacity  and  access  time. 
Other  factors  which  might  be  included  are  cataloguing  procedures  (what  goes 
where),  query  variations  (ways  of  asking  for  the  same  thiug),  and  so  on.  When 
d  "-eloped,  the  model  will  be  programmed  and  run  on  HRB-Singer's  IBM  1620 
computer.  Modifications  to  the  data  reduction  and  analysis  programs  will  be 
made  where  necessary. 

Once  the  manual  system  model  has  been  developed,  it  will  be  integrated 

AW-ith  the  computer-based  retrieval  model  into  what  could  be  called  a  General _ 

Information  Retrieval  System  Simulation  (GIRSS)  model,  or  G  model.  It  would 
be  this  model  which  would  find  the  greatest  application  since  most  information 
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retrieval  systems  are  a  composite  of  what  has  been  called  manual  and  computer 
systems.  The  G  model  could  be  extended  to  the  more  general  data  processing 
systems  or  it  could  be  tailored  to  a  specific  system,  as  shown  in  Figure  18.  This 
would  complete  the  major  work  on  information  retrieval  system  evaluation  using 
the  response  time  measure. 


It  is  a  rare  systehi  today  that  can  be  Judged  solely  on  a  single  criterion.  If 
there  were  only  one  system  which  could  do  the  retrieval  job  in  the  requped 
lime,  then  a  user  would  have  little  choice  in  deciding  which  system  he  ihould 
purchase.  However,  present-day  technology  allows  the  user  to  choose  and  ll 
tailor  the  procedures  and  components  of  his  system  as  he  sees  fit  --  in  fact, 
the  problem  now  is  which  procedures  and  which  equipment  to  choose.  In  a  sense, 
the  user  has  an  allocation- problem;  he  has  a  given  amount  of  resources  (cash, 
time  available  for  a  response,  space)  and  he  must  choose  the  best  possible  fit 
of  equipment  and  procedures  to  make  up  his  system.  The  present  and  future 
work  will  aid  the  user  (or  manager)  in  these  decisions.  The  manager  can  use 
the  G  model,  for  example,  to  determine  whether  or  not  the  system  he  has 


selected  will  satisfy  his  time  requireme 
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he  has.  Models  of  the  system  using  other  evaluation  (decision)  measures  will 


V  vaijie  tp  him..  It  i.8;,:th*3.<p,  j^nef  sure8,A^|4m 

this  ^tksk  Will  p’iroc'ftdiii^e  is  diagi'ammed  in 


Figure  19. 


A  future  goal  of  this  task  could  be  the  (|evelopment  of  an  information 
retrieval  iyilem  model  which  would  be  baaed  on  those  measures  the  user  must 
consider.  The  user  would  specify,  for  example,  hie  cost  constraints,  response 


c 


c 


c 


mviN:  SYSTES  ^ 

i  - 


SELECT  MEASliEE 


> 


DEVELOP  SYSTEM  MODEL 
- IS.MBA  ■■■ 


CAkALYSI^ilSuTINtO 


REFINE  MODEL 


FIG.  19  -  ITERATIVE  PROCEDURE  FOR  THE  SIMULATION  AND  EVALUATION 
OF  INFORMATION  RETRIEVAL  SYSTEMS 


time  cohetraints,  space,  restrictioiis,  data  volume  consideratiotis  and  ,en-t®P  these 
data  into  the  model.  Also  available  for  entry  would  be  state -o£-i|he-art  equip- 
iment  characteristics  and  normal  event  sequences  for  retrieval  systems,  along 
with  their  corresponding  time  distributions,  cost/operation,  and  other  data. 


The  model  could  then  select  and  recommend  the  best  (i.  e.  ,  optimum  in  some 


sense)  fi^urider  the  given  constraints .  Whether  or  goal  1«  attainable  at, 

present,  it  does  provide  both  motivation  and  direction  for  the  work  to  be  per¬ 


formed  by  this  task  in  the  future. 


!/ 

■I 


-BIRID 


The  data  given  in  this  section  shows  the  numerical  form  of  the  input  for 
Mod  II  based  on  the  example  described  in  section  2,  Chapter  II,  This  input  is 
composed  q£  ss,muIat*on  «aba  ano  ay-sscm  cain,  prceonv  biio 
simply  the  nuihber  of  Iterations  to  be  considered.  The  system  data  Is  a  numer¬ 
ical  description  of  the  system  being  simulated.  ~ 


TA^LE  4.  INFORMATION  RETRIEVAL  SYSTEM  SIMULATION 


EXAMPLE  -  INPUT  DATA  FOR  MOD  II 


Variable 

Value 

Card 

Golumu 

Card 

-Number 

Variable 

Value 

Card 

Column 

'.t - 

Ci.'^rd  • 
Nuniber 

NR 

500 

3-5 

1 

.  1 

19, 20- 

14 

m 

3 

3 

2  - 

.  7-5- V 

• 

NI 

2 

6 

2 

PE  (1,  1) 

.9 

1,2 

15 

-• 

NO 

3 

9 

2 

PE  (l,4i) 

.  1 

7.8 

15 

-- 

NQ 

3 

12 

2 

PE  (2.  l| 

.  8 

1.2 

16 

T(l,l) 

-fb 

T-i 

3 

PE  (2,  2) 

.  2 

7.8 

16 

-  -  -  - 

T(1.2) 

1.  12 

11-14 

3 

PE  (3,1) 

.  7 

1.2 

17 

T(z;i) 

7,46 

1-4 

4 

PE  (3,2) 

.  3 

7,8 

.17 

T(2,  2), 

22.  71 

11-15 

4 

PE  (4,  1) 

.  2 

1.2 

18 

T(3,l) 

5. 

1.2 

5 

PE  (4,  2) 

.  8 

7.8 

18 

-  ^ 

T(3.2) 

-15. 

11-13 

5 

PE  (5,  1) 

.  1 

1.2 

19 

:  ■  . 

T{4.  1) 

16. 

1-3 

6 

PE  (5.2) 

.9 

7,8 

19 

T(4.2) 

__31.— 

“11-13- 

-  6 

FQ(1,1) 

3- 

1,2 

.  20 

T(5.  1) 

14. 

1-3 

7 

FQ  (1,2) 

9. 

11,  12 

2f- 

T(5,2) 

41. 

11-13 

7 

FQ  (2,  1); 

4. 

1.2 

2| 

T(6, 1) 

10. 

1-3 

8 

FQ  (2^1 2K 

10. 

11-13 

21 

T(6,2) 

30. 

11-13 

8 

FQ  (3,  1) 

5. 

1,2 

'22 

T(7,  1) 

42. 

1-3 

9 

FQ  (3,2) 

11. 

11-13 

22 

T(-7-,'Z-) 

185.  - 

11-13 

9 

FQ  (4,  1) 

4. 

1.2 

23 

......  . 

1) 

^15.^ 

10 

FQ  (4.2) 

12,. 

11-13 

23 

T(8,2) 

25. 

11-13 

10 

FQ  (5,  1) 

3. 

1.2 

24 

T(9.  1) 

.0001 

1-5 

11 

FQ(5,2) 

1-3. 

11-13 

24 

T(9,2) 

76.  8 

M-14 

11 

FQ(6,  1) 

1. 

1,2 

25 

. 

T(10, 1) 

.  0001 

1-5 

12 

FQ  (6,2) 

2. 

11,  12 

25 

T(10.2) 

1. 

11,12 

12 

FQ(7,1) 

4. 

1.2 

26 

PN  (1) 

.  5 

1.2 

13 

FQ  (7,2) 

6. 

11,  12 

26 

PN  (2) 

.  3 

7.8 

13  • 

FQ  (8,  1) 

5. 

1,2 

27 

PN  (3) 

.  2 

13, <14 

13 

FQ  (8.2) 

8. 

11,  12 

27 

PT  (1) 

,4  ■ 

1.2 

14 

FQ(9.1) 

7. 

1.2 

28 

•  * 

PT  (2) 

.  3 

7  8 

14 

FOt9 ,  2) 

- rlra-JrS- 

— 2S - 

PT  (3) 

.  15 

13-15 

14 

FQ  (10,  1)  9. 

1.2 

29 

-82- 
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TABLE  4.  INFORMATION  RETRIEVAI.  SYSTEM-SIMULATION 

EXAMPLE  -  Input  data  for  mod  II  (Cont'd) 

Variable 

Value 

Card 

Column 

Card 

Number 

Variable  Value 

Card 

Column 

Card 

Number 

FQ  (10,2) 

14, 

•1 

11-13! 

29 

PQC  (6,  1) 

0 

45 

PQF^,  1) 

5. 

r,2 

30 

PQC  (6.  2) 

0 

45 

PQF(1,2) 

]  5. 

11-13 

30 

PQC(7.1),, 

0 

46 

PQF(2.1) 

5, 

1,2 

31 

PQC  (7,  2)" 

0 

46 

PQF(2,2) 

15. 

11-13 

31 

PQC  (8,  1) 

0 

47 

PQF(3.  1) 

5. 

1.2 

32 

PQC  (8,  2)  • 

0. 

47 

PQF(3,2) 

15. 

11-13 

32 

PQC(9rl) 

0 

48 

PQF(4,  1) 

5. 

.1.2 

33 

PQC  (9. 2) 

0 

48 

PQF(4,2) 

15. 

11-13 

33 

PQC  (10,  1) 

0 

49 

PQF(5,  1) 

5. 

1.2 

34 

PQC  (10,  2) 

0 

49 

PQF(5,2) 

15. 

11-13 

34 

CQC(1,  1) 

5. 

1.2 

50 

PQF(6, 1) 

5. 

1,2 

35 

CQC(1,2) 

10. 

11-13 

50 

PQF(6,2) 

15. 

ir-i3 

35 

GQC(27l) 

5. 

1  r2  “ 

-  51-  ' 

PQF  (7,  1) 

5. 

1,2 

36 

CQC(2,2) 

10. 

11-13 

51 

PQF  (7,2) 

15.“ 

1  r- 13" 

36 

CQC  3.  1) 

5. 

1,2  -  - 

-  52  — 

PQF(8,  ]) 

5. 

1.2 

37 

CQC(3,2) 

10. 

11-13 

52 

PQr(8,2) 

15. 

11-13 

37 

CQC  (4,  1) 

5. 

1.2 

53 

PQF  (9,1) 

5.  , 

. 1.2 

38 

CQC  (4,  2) 

10. 

■■1;1:-13-- 

53 

PQr(9.-i)' 

15. 

11  -13 

38 

CQC(5,  1)  ' 

1,2 

'■■"■■"■^■  ■54  -  -  ■ 

STiTT-iMfl  1\ 

5 

1  1 

39  -  - 

CQC- (  5,  2)-  - 

-10^ 

11-L3 

...  34.  - 

A- \  *  v,  *  / 

PQF  (10,  2) 

15. 

1  1  -  1'3 

39 

CQC  (6,  1) 

0 

55 

PQC(1,  1) 

5. 

1,2 

40 

CQC  (6  ,  2) 

0 

55 

PQC(1,2) 

20. 

11-13 

40 

CQC  (7,  1) 

0 

56 

PQC  (2,  1) 

5. 

1.2 

41 

CQC  (7,  2) 

0 

56 

PQC  (2,2) 

20. 

11-13 

41 

CQC  (8,  1) 

0 

57 

PQC  (8,1) 

5. 

1.2 

42 

CQC  (8,  2) 

0 

57 

PQC(3,2) 

20. 

11-13 

42 

CQC  (9,1) 

0 

58 

PQC  (4,1) 

5. 

1.2 

43 

CQC  (9, 2) 

0 

58 

PQC  (4,  2) 

20. 

11-13 

43 

CQC  (10,  1) 

0 

59 

TABtE  4r  INFgRMA  i'iON  KETRIiSiVAL  SYSTEM  SIMULATION 


example  -  INPUT  DATA  FOR  MOD  II  (Cont'd) 


Variable 

Value 

Card 

Column 

Card 

Number 

Variable 

Value 

Card 

Column 

Card 

Number 

EQ  (1,  2> 

..Q 

60 

TT'TN  f  n  1  \ 

\  e  ,  a; 

A 

V 

'7  4. 
t  V 

EG  (2,  1) 

0 

61 

ED  (7,  2) 

0 

76 

EQ  {i,  2) 

0 

61 

ED  (8,  1) 

0 

77 

EQ  (3.  1) 

0 

62  ■ 

ED  (8,  2) 

■0 

77 

EQ  (3,2) 
EQ  (4.  1) 

0 

62 

63 

ED  (9,  1) 

ED  (9,  2) 

0 

78 

0 

0 

78 

EQ  (4,  2) 

0 

63 

ED  (10,  1) 

0 

79 

EQ  1)' 

0 

64 

ED  (10,2) 

0 

79 

EQ  (5,2) 

0 

64 

EQC  (1.  1) 

3. 

1,Z 

80  — 

EQ  (6,  1) 

5, 

1,2 

65„  . 

■EQC  (1,2) 

10. 

11-13 

80 

EQ  (6,  2) 

15. 

11-13 

EQC  (2.  1) 

4. 

1,2 

81 

EQ  (7,  1) 

5, 

1,2 

.66 

EQC  (2,2) 

.  10. 

11-13 

81 

EQ  (7,  2) 

_ 15, 

11-13 

"66 

EQC  (3,  1) 

-5.  ' 

1.2 

82 

EQ  (8,  i)  . ; 

5.  ' 

1,2 

67 

EQC  (3,2) 

10. 

11-13 

82 

EQ  (8,  2)  ■ 

.  .  .. 

11-13 

67 

EQC  (4,  1) 

6. 

1,2 

83 

EQ  (9,  1) 

5. 

1,2  , 

68 

EQC  (4,2) 

10. 

11-13 

83 

EQ  (9,2) 

15. 

11-13 

68 

EQC  (5,  1) 

7. 

1,2 

84 

EQ  (10,1) 

5, 

1,2 

69 

EQC  (5„.2) 

10. 

11  -1 3 

84 

-EQ  (10,  2) 

15.— 

11-13 

- 69 

EQC  (6,^  1) 

-0  -  „ 

85 

ED  (1,  1) 

0 

70 

EQC  (6,2) 

0 

BS 

ED,j[l,2) 

0 

70 

EQC  (7,  1) 

0 

■ 

86 

ED  (2,  1) 

0 

71 

EQC  (7,2) 

0 

86 

ED  (2,  2) 

0 

f  71 

EQC  (8,  1) 

0 

87 

ED  (3,  1) 

0 

72 

EQC  (8,2) 

0 

87 

ED  (3,  2) 

0 

72 

EQC  (9.1) 

0 

88 

ED  (4,  1) 

0 

73 

EQC  (9.2) 

o‘ 

88 

ED  (4,  2) 

0 

73 

EQC  (10,  1) 

0 

89 

ED  (5,'  1) 

0 

74 

EQC  (10,  2) 

0 

89 

ED  (5,  2) 

0 

74 

TQT  (1,  1) 

0 

90 

KUAijaun 

0 

75 

TQT  (1.2) 

~Q - 

90 

ED  (6,  2) 

0 

75 

TQT  (2.  1) 

0 

91 

-84- 
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TABLE  4.  INFORMATION  RETRIEVAL  SYSTEM  SIMULATION 
EXAMPLE  -  INPUT  DATA  FOR  MOD  II  (Cont'd) 


Variable 

Value 

Card 

Column 

Card 

Number 

TQT  ,y 

.  0  v  . 

...  . . . 

___ 

TQT  (3,  1) 

p 

92  . 

TQT  (3,2) 

0 

92 

TQT  (4,  1) 

0 

93 

TQT  (4,  2)  - 

0 

- - 

- 93 - 

TQT  (5,  1) 

0 

94 

TQT  (5,2) 

0 

94 

TQT  (6,  1) 

0 

95 

TQT  (6,2) 

0 

•• 

95 

TQT  (7,  1) 

0 

96 

TQT  (7,2) 

-0 

96 

TQT_(8,_1) 

•0  . 

••  .  ; - 

9t:'  _ 

TQT  (8.,  2) 

0 

9?- 

TQT  (9,  1)  , 

0 

98 

TQT  (9,  2) 

0 

98 

TQT  (10,  1) 

0 

■  .  99 

T'i3T  (lo’,_2) 

p 

_ 

-  ^  .  \ 

DE  (1) 

.  6 

1.2 

100 

T  O,  (1,  1) 

2. 

7.8 

100 

T  O  (1,2) 

12. 

17-19 

IW  ‘ 

DE  (2) 

.  1 

1.2 

101 

TO  (2,  1) 

10. 

7-9 

101 

TO  (2.,  2) 

35. 

17-19 

101 

DE  (3) 

.  3 

1.2 

102 

TO  (3,  1) 

5. 

7.8 

102 

TO  (3,  2) 

20. 

17-19 

lOZ 

A 

.  01 

1-3 

103 

B 

.  008 

7-10 

103 

-85- 

Reverse  (Page  86)  Blank 


APPENDIX  B 

SIMULATION  OUTPUT  EXAMPLE 


The  following  nine  pages  are  a  copy  of  the  computer  output  from  the  Mod 
simulation,  using  the  input  data ‘given  in  Appendix  A,  Tabl 
each  data  column  are  as  follows: 


^1<act  Vi  A  #4  4  rra 


A  Iteration  number 
B  .  Response  time 

C  Numb eV  of  queries  in  the  iteration 
D  Query  data 


DT  “Number  of  timej  query  type  1  was  used  in  the  iteration  ^ 

-  .jf 

D2  Number  of  times  query  type  2  was  used  in-the  iteration -  -  --- 

D3  Number  of  times  query  type  3  was  used  in  the  iteration 


D4  Number  of  tlmeni  query  type  4  was  u^ad  in  the  iteration 
D5  Number  of  times  query  type  5  was  used  in  the  iteration 
E  (  Ipput  device  data 

El  Numbar  of  times  input  device  LwM  ujed  in  the  iteration 


E2  Number  of  times  input  device  2  was  used  in  the  Iteration 


F.  Output  device  data 

Fl  Number  of  times  oiit^  de vie e^l  ^was  usedTtrtho^itaration 
F2  Number  of  times  output  device  2  was  used  in  the  iteration 


F3  Number  of  times  output  device  3  was  used  in  the  Iteration 


This  output  example  will  be’ discussed  further  in  Chapter  3. 


IT 

D2 

03 

04 

DO  "  El 

E2 

tiTT 

F2 

■fTi 

1 

467.56 

2 

2 

0 

0 

0 

0 

2 

0 

i 

1 

0 

2 

325.22 

1 

1 

0 

0 

0 

0 

1 

0 

0 

1 

0 

3 

230.4? 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

4 

302.88 

1 

0 

1 

0 

0 

0 

1 

0 

0 

0 

1 

5 

374.95 

2 

0 

0 

1 

0 

1 

1 

1 

1 

.1 

0 

6 

290.75 

1 

1 

0 

0 

0 

0 

1 

0 

0 

0 

1 

7 

29  T  14 

3 

1 

1 

1 

0 

0 

0 

3 

2 

n 

I 

n 

U 

f\l\  /.  "3 

^  /  w  • 

2 

2 

0 

0 

0 

0 

0 

2 

0 

0 

9.  .. 

,  396.._1.2.- 

2 

_0 

1 

,  1^ 

_5 

b 

1 

.  i 

,  1 

0. 

1 

A'  :  ■ 

:■  -  ■.  * 

1 

•A 

:  :0; 

? 

1  w 

^  '■!  ' 

■  'i  ' 

U 

JLL 

341.57 

2 

i 

{ 

6 

5 

0 

1 

1 

2 

0 

0 

12 

252.13 

1 

1 

0 

0 

0 

0 

1 

0 

1 

0 

0 

13 

357.26 

1 

0 

1 

0 

0 

1 

0 

1 

0 

.  0 

14 

361.96 

3 

1 

1 

1 

0 

0 

2 

1 

1 

1 

1 

15 

3I8..77 

1 

0 

0 

0 

1 

0 

0 

1 

~  1 

X 

0 

16 

L93.38 

1 

0 

1 

0 

0 

0 

i 

0 

1 

-X 

5 

'7 

403.77 

2 

0 

2 

0 

0 

0 

.,2 

0 

i 

0 

1 

le 

319^24 

2 

1 

1 

0 

0 

0 

2 

0 

2 

0 

0 

19 

300219 

1 

0 

0 

1 

0 

0 

1 

0 

1 

0 

0 

2ft 

284.42 

1 

L 

0- 

0 

0 

0 

1 

0 

■,1 

0 

X' 

21 

480.92 

3 

1 

0 

1 

1 

0 

2 

1 

",2 

1 

0 

22 

291.94 

1 

1 

0 

0 

■'0 

0 

■  X 

X 

■■;ir 

•0“ 

“e 

23 

■399.9  5 

2 

-0 

0 

1 

1 

0 

1 

1 

;:1 

0 

1 

24 

468.73 

3 

1 

2 

0 

0 

0 

~r 

“X 

1 

"T" 

"X 

25 

260.76 

1 

0 

0 

0 

0 

1 

0 

1 

1 

0 

0 

26 

378.71 

1 

0 

1 

b 

0 

0 

1“ 

0 

'“0 

"X 

X' 

27 

352.83 

2 

1 

0 

1 

.  0 

0 

2 

0 

1 

0 
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